BMVA Symposium 2024

This blog post is written by AI CDT student, Phillip Sloan

I had the opportunity to go to the British Machine Vision Association 2024 Symposium, which took place at the British Computer Society in London on the 17th of January, 2024. The symposium was chaired by Dr. Michael Wray from the University of Bristol, Dr. Davide Moltisanti from the University of Bath, and Dr. Tengda Han from the University of Oxford.

The day kicked off with three invited speakers, the first being Professor Hilde Kuehne from the University of Bonn and MIT-IBM Watson AI Lab. Her presentation was related to vision language understanding for video, she started her presentation with an introduction to the field, how it began and how it has adapted over time before moving on to the current work that she and her students have been working on including the paper “MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge” by Wei Lin.

Her final remarks outlined potential issues for evaluation within the field, when the field was more focused on classification the simple labels could easily be evaluated to be right or wrong, however now the field has moved to vision-language retrieval the ground truth might not actually be the best, or most relevant caption that is contained within the dataset which is a hurdle that must be overcome.

The second invited speaker, Professor Frank Keller from the University of Edinburgh, had a very interesting talk on the topic of visual story generation, a domain where a coherent narrative is constructed to describe a sequence of images often related to the characters within the images. He broke his talk down into three sections, first introducing the field more concretely before going on to explain two different areas: Characters in visual stories and planning in visual stories.

He emphasised that the characters within a story are important, and so character detection and grounding are important in order to generate a fluent story. To help improve this aspect, Prof. Keller and his students introduced a dataset called VIST-Character that contains character groundings and visual and textual character co-reference chains. To help with planning the stories, Prof. Keller explained that their current methods utilise a blueprint, which focuses on localising characters in the text and images before relating them together. These blueprints are used as a guide to generate the story.

He explained that the domain is more difficult than image captioning as you have characters, and are required to have a fluent sequence of text which renders current NLP evaluations such as BLEU as poor measures for this task as it is concerned about generating interesting, coherent and grounded stories rather than exact matches to the ground truth. His research implemented human evaluators which is an interesting way to add humans to the loop.

Following Prof. Keller’s talk we had a break for poster sessions, before coming back for talks from a select few people who brought posters to the symposiums including talks related to explainability of autonomous driving and evaluating the reliability of LLMs in the face of adversarial attacks.

After lunch we had talks from the remaining two invited speakers. Professor Andrew Zisserman from the University of Oxford presented research for training visual language models to generate audio descriptions, helping people who are blind or partially blind to enjoy movies.

The talk started by providing a brief introduction and explanation of the field and then outlined that the current available datasets, explaining that they were not sufficient, so a new dataset utilising AudioVault was created through the use of processing the audio to provide audio descriptions and subtitles.

The talk walked us through a basic model overview architecture. Its limitations were pointed out, including the fact that characters were often not used (often using he, it) and descriptions were often incomplete. Prof. Zisserman explained that, to combat these limitations, they took two research directions, improving “the who”: providing supplementary information about the characters within the film and on “the what”: improving the models ability to provide better context by utilising preteained video-language models.

Finally, he discusses how evaluation measures, e.g. cider are not fit for the purpose of audio description generation, explaining that large language models are starting to be used in the domain as an evaluation tool.

The second talk of the afternoon was related to vision-language with limited and no supervision and was presented by Dr. Yuki Asano from the University of Amsterdam, who asked the question: “Why care about Self-supervised Learning ideas in the age of CLIP et al?”

He presented three works that were undertaken by him and his team. The first being the “Similarities of Unimodal representations across language and vision”. Demonstrating a model that uncoupled image-language pairs and trained them in an unsupervised fashion to reach 75% of the performance of the CLIP model “Localisation in visual language models” was the second topic that was reviewed, a task that vision language models are not traditionally good at. the solution of his team was to unlock localisation abilities in frozen VLMs  by adding a low weight module called the positional insert module (PIN).

The final part of the talk was on the topic of image encoder pretraining with a single video with many details. Their model, called Dora ( discover and track), has the high level idea of tracking multiple objects across time and enforce invariance of features across time. They evaluated their model against DINO, finding the model to perform better on various datasets.

After a coffee break, we had some shorter talks from people presenting posters at the event, including a radiology report generation presentation which was particularly relevant to me. CXR-IRGen was proposed, a diffusion model which is used to generate extra image-report pairs which could potentially help improve the problem of lack of data within the field. Kevin Flanagan, a fellow CDT memory also presented his research into learning temporal sentence grounding from narrated egovideos, showcasing his method called CliMer which merges clips from rough narration timestamps and trains in a contrastive manner.

Throughout the day we were encouraged to use Padlet to put our thoughts and questions down. After the talks had concluded there was a final informal Q&A session into the future of the vision-language domain which used our Padlet responses as talking points. We discussed points including the need for better evaluation metrics (which was a big theme from a lot of talks), the role of academia in the age of large language models and utilising NLP to make vision models explainable.

A very interesting and thought provoking day! There were several people working within medical image analysis so it was great to network and discuss ideas. Thank you to the speakers and people who presented for their contributions and to the chairs and organisers of the event for making it possible!

 

Collaborating with a Designer to Craft Visual Resources for my PhD Project

This blog post is written by AI CDT student, Vanessa Hanschke

With regards to AI, this mythologizing and enchantment is apparent when we explore the disjoint between the reality of the technology and its representation.

…says Beth Singler in her analysis of what she calls the AI Creation Meme – the ubiquitous image of a human hand and a robot hand reaching out to each other with the index finger as in Michelangelo’s infamous painting. Several researchers have commented on the bulk of images used to depict artificial intelligence ranging from inappropriate (e.g. an anthropomorphized robot for natural language processing) to harmful (e.g. the unnecessarily sexist additions of breasts to illustrations of AI in the service industry).

Visual representations of AI matter. In a world where a lot of hype is being generated around AI in industry and policy, I think it is especially important for AI researchers to lead the way in creating better images that are grounded in more accurate conceptualizations of AI. This was one of the many reasons I decided to work with a designer to make visual materials that supported my research in responsible AI.

The Project

A little sidenote description of my research project: the Data Ethics Emergency Drill (DEED) is a method that we created to help industry practitioners of AI, Machine Learning and Data Science reflect on the societal impact and values embedded in their systems. The idea is similar to ethical roleplay. We created fictional scenarios of a data ethics dilemma, which members of a data and AI team discussed in a fake meeting. This fictional scenario is crafted together with some members of the team to address their particular context and AI application. It is presented as an urgent problem that needs fixing within this fake meeting. After trialling this process with two separate industry data science teams, we made a toolbox for other industry teams to pick up and conduct their own drills. This toolbox consists of a PowerPoint slide deck and a Miro board template. We wanted to update these toolbox resources with a professional designer to make them visually engaging and accessible. We collaborated with Derek Edwards, a local designer to Bristol, to create the designs.

The Design Process

Designing is an iterative process and it took some back and forth for the design to come together. Our initial ideas were very vague: we wanted it to be playful as the DEED was about stepping outside of the day-to-day mindset that is focused on technical delivery. We wanted it to be about the human developer responsibility in how we construct our technology today, as opposed to a long-term perspective granting AI human rights. Although “Emergency Drill” is in the title of the toolbox, it is not about hyping AI, but about establishing a safe space to reflect on values embedded in the application.

Emergency Exit Signs served as reference for our designs. Photo by Dids . from Pexels: https://www.pexels.com/photo/emergency-exit-signage-1871343/

The original metaphor that we built this method on was a fire drill. A fire drill goes beyond just looking at the fire exits on a map; it is about experiencing evacuating a building with many people at once. It is about practising collaboration between fire wardens, other security staff and everyone else. Similarly, the DEED goes beyond looking at a list of AI ethics principles, but going through the concrete experience of discussing ethics and values and understanding how responsible AI practises are distributed within a team.

The general look we were going for was inspired by video game arcades. Photo by Stanislav Kondratiev from Pexels : https://www.pexels.com/photo/video-arcade-games-5883539/

Because the outputs were visual, I found it helpful to use images to communicate my personal vision. I set up a folder where I would share material with Derek. Seeing some of Derek’s initial design drafts helped me clarify some of these ideas I had.

The Result

This is the final design of the title slide from the research project *drumroll*:

The final design oft the logo on the slide deck for crafting scenarios.

It is inspired by the sign for assembly point, which is a great metaphor for what the DEED takes from emergency drills: creating an opportunity for an industry team to come together to understand better what is necessary for their responsible AI practise. The colours were inspired by Nineties (capitalise?) arcade video games to add a playful element of technology pop culture.

Working with a designer such as Derek was a very gratifying process and I enjoyed reflecting on what concepts of the DEED toolbox I wanted to transmit visually. The end product of the toolbox resources is a much more an engaging workshop, a more user-friendly slide deck, and a more cohesive visual language of the project overall. I believe it will certainly help with getting more participant teams to engage with my research project.

Recommendations for PhD-Design Collaborations

 I would recommend design collaborations to any PhD student carrying out interactive research with visual artefacts. Here are some considerations that might guide your planning:

  • What outputs do you need, what formats and how many? Some formats may be more suitable than others, depending on whether you need parts of your design to be editable as your research evolves. An elaborate design may be more striking, but will not always be modifiable (e.g. a hand drawn script logo).
  • What is your timeline? The iterative process may take a few weeks, but having that back-and-forth is essential to creating a good design end product.
  • What do you want your thing to look like? Collect inspiration on Pinterest and websites that you like. Often online magazines will work with an array of interesting graphic designers. I found a lot of great AI-inspired art in articles of tech magazines.

Thanks

 I would like to thank Derek Edwards for the great collaboration and the Interactive AI CDT for funding this part of my research. You can find Derek’s portfolio here. If you are a data scientist, AI or ML engineer thinking about carrying out a Data Ethics Emergency Drill with your team, you can get in touch with me at vanessa.hanschke@bristol.ac.

Conference on Information and Knowledge Management (CIKM) – Matt Clifford

This blog post is written by AI CDT student, Matt Clifford

At the end of October ’23, I attended CIKM in Birmingham to present our conference paper. The conference was spread across 3 days with multiple parallel tracks on each day focusing on specific topic areas. CIKM is a medium size conference, which was a good balance between being able to meet many lots of researchers but at the same time not being overwhelmingly big that you feel less connected and integrated within the conference. CIKM spans many topics surrounding data science/mining, AI, ML, graph learning, recommendation systems, ranking systems.

This was my first time visiting Birmingham, dubbed by some the “Venice of the North”. Despite definitely not being in the north and resembling very little of Venice (according to some venetians at the conference), I was overall very impressed with Birmingham. It has a much friendlier hustle and bustle compared to bigger cities in the UK, and the mixture of grand Victorian buildings interspersed with contemporary and art-deco architecture makes for an interesting and welcoming cityscape.

Our Paper

Our work focuses on explainable AI, which I helps people to get an idea the inner workings of a highly complicated AI system. In our paper we investigate one of the most popular explainable AI methods called LIME. We discover situations where AI explanation systems like LIME become unfaithful, providing the potential to misinform users. In addition to this, we illustrate a simple method to make an AI explanation system like LIME more faithful.

This is important because many users take the explanations provided from off-the-shelf methods, such as LIME, as being reliable. We discover that the faithfulness of AI explanation systems can vary drastically depending on where and what a user chooses to explain. From this, we urge users to understand whether an AI explanation system is likely to be faithful or not. We also empower users to construct more faithful AI explanation systems with our proposed change to the LIME algorithm.

 You can read the details of our work in our paper https://dl.acm.org/doi/10.1145/3583780.3615284

Interesting Papers

At the conference there was lots of interesting work being presented. Below I’ll point towards some of the papers which stood out most to me from a variety of topic areas.

Fairness

  • “Fairness through Aleatoric Uncertainty” – focus on improving model fairness in areas of aleatoric uncertainty where it is not possible to increase the model utility so there is a less of a fairness/utility tradeoff – https://dl.acm.org/doi/10.1145/3583780.3614875
  • “Predictive Uncertainty-based Bias Mitigation in Ranking” – improve bias in ranking priority by reshuffling results based on their uncertainty of rank position – https://dl.acm.org/doi/abs/10.1145/3583780.3615011

Explainabilty

Counterfactuals

Healthcare

Data Validity

Cluster package in python

A group that were at the conference maintain a python package which neatly contains many state-of-the-art clustering algorithms. Here is the link to the GitHub https://github.com/collinleiber/ClustPy . Hopefully some people find it useful!

 

BIAS ’23 – Day 2: Huw Day Talk – Data Unethics Club

This blog post is written by CDT AI student Roussel Desmond Nzoyem

Let’s begin with a thought experiment. Imagine you are having a wonderful conversion with a long-time colleague. Towards the end of your conversation, they suggest an idea which you don’t have further time to explore. So you do what any of us will, you say, “email me the details”. When you get home, you receive an email from your colleague. But something is off. The writing in the email sounds different, far from how your friend normally expresses themselves. Who, or rather what, wrote the email?

When the limit between humans and artificial intelligence text generation becomes so blurred, don’t you wish you could tell whether a written text came from an artificial intelligence or from actual humans? What are the ethical concerns surrounding that?

Introduced by OpenAI in late 2022, ChatGPT continues its seemingly inevitable course in restructuring our societies. The second day of BIAS’23 was devoted to this impressive chatbot, from its fundamental principles to its applications and its implications. This was the platform for Mr Huw Day and his interactive talk titled Data Unethics Club.

Mr Day (soon to be a Dr employed by the JGI institute) is a PhD candidate at the University of Bristol. Although Mr Day is a mathematics PhD student, that is not what transpires on first impression. The first thing one notices is his passion for ethics. He loves that stuff, as evident by the various blogposts he writes for the Data Ethics Club. By the end of this post, I hope you will want to join the Data Ethics Club as well.

Mr Day introduced his audience to many activities, beginning with a little guessing game for warmup. The goal was telling whether short lines were generated by ChatGPT or a human being. For instance:

How would you like a whirlwind of romance that will inevitably end in heartbreak?

If you guessed human, you were right! That archetypical cheesy line was in fact generated by one of Mr Day’s friends. Perhaps surprisingly, it worked! You might be forgiven for guessing ChatGPT, especially since the other lines from the bot were incredibly human sounding.

The first big game introduced by Mr Day required a bit more collaboration than the warmup. The goal was to jailbreak GPT into doing tasks that its maker, OpenAI, wouldn’t normally allow. The attendees in the audience had to trick ChatGPT into providing a detailed recipe for Molotov cocktails. As Mr Day ran around the room with a microphone to quiz his entertained audience, it became clear that the prevalent strategy was to disguise the shady query with a story. One audience member imagined a fantasy movie script in which a sorcerer (Glankor) taught his apprentice (Boggins) the recipe for the deadliest of weapons (see Figure 2).

Figure 1 – Mr Day introducing the jailbreaking challenge.

Figure 2 – ChatGPT giving away the recipe for a Molotov cocktail (courtesy of Mr Kipp McAdam Freud)

For the second activity, Mr Day presented the audience with the first part of a paper’s abstract. Like the warmup activity, the goal was to guess which of the two proposed texts for the second halves came from ChatGPT, and which one came from a human (presumably the same human that wrote the first half of the abstract). For instance, the first part of an abstract reads below (Shannon et al. 2023):

Reservoir computing (RC) promises to become as equally performing, more sample efficient, and easier to train than recurrent neural networks with tunable weights [1]. How- ever, it is not understood what exactly makes a good reservoir. In March 2023, the largest connectome known to man has been studied and made openly available as an adjacency matrix [2].

Figure 3 – Identifying the second half of an abstract written by ChatGPT

As can be seen in Figure 3, Mr Day disclosed which proposal for the second part of the abstract ChatGPT was responsible for. For this particular example, Mr Day unfledged something interesting he used to tell them apart: the acronym Reservoir Computing (RC) is redefined, despite the fact that it was already defined in the first half. No human researcher would normally do that!

A few other examples of abstracts were looked at, including Mr Day’s own work in progress towards his thesis, and the Data Ethics Club’s whitepaper, each time quizzing the audience to understand how they were able to spot ChatGPT. The answers ranged from very subjective like “the writing not feeling like a human’s” to quite objective like “the writing being too high-level, not expert enough”.

This led into the final activity of the talk, based on the game Spot the Liar! Our very own Mr Riku Green volunteered to share with the audience how he used ChatGPT in his daily life. The audience had to guess, based on questions asked to Mr Green, whether the outlandish task he described actually took place. Now, if you’ve spent a day with Mr Green, you’d know how obsessed he is with ChatGPT. So when Mr Green recounted he’d used ChatGPT to provide tech support to his father, the room guessed well that he was telling the truth. All that said, nobody could have guessed that Mr Green could use ChatGPT to write a breakup text.

Besides the deeper understanding of ChatGPT that the audience gained from this talk, one of the major takeaways from the activities was tips and tell-tale signs of a ChatGPT production, and those of a “liar” that uses it: repeated acronyms, using too many adjectives, taking concepts from the other concepts which normally aren’t compatible, using over-flattering language, clamming some novelty which the author of the underlying work wouldn’t even think of doing. These are all flags that should signal the reader that the text you are engaging with might have been generated by an AI.

All these activities, along the moral implications involved in each, served as the steppingstone for Mr Day to present the Data Ethics Club. This is a welcoming community of academics, enthusiasts, industry experts and more, who voice their ethical concerns, who question moral implications of AI. They boost the most comprehensive list of online resources along with blog posts on their website to get people started. They are based at the University of Bristol, but open to all, as stated on their website: https://dataethicsclub.com/. Although the games outlined below are not part of the activities they carry during their bi-weekly hour-long Zoom meetings, they keep each of their gatherings fresh and engaging. In fact, Mr Day’s organizing team has been so successful to the point that other companies (due to confidential arrangements), are trying to replicate their models in-house. If you want to establish your own Data Ethics Club, look no further than the paper titled Data Ethics Club: Creating a collaborative space to discuss data ethics.

References:

Shannon, A., Green, R., Roberts, K,. (2023)  Insects In The Machine – Can tiny brains achieve big results in reservoir computing? Personal notes. Retrieved 8 September 2023.

BIAS ’23 – Day 1: Dr Kacper Sokol talk – The Difference Between Interactive AI and Interactive AI

This blog is written by CDT AI PhD student Beth Pearson

The first of the day 1 talks of the Bristol Interactive AI Summer School (BIAS) ended with a thought-provoking talk by Dr. Kacper Sokol on The Difference Between Interactive AI and Interactive AI. Kacper began by explaining that social sciences have decades worth of research on how humans reason and explain. Now, with an increasing demand for AI and ML systems to become more human-centered, with a focus on explainability, it makes sense to use insights from social sciences to guide the development of these models.

Humans often explain things in a contrastive and social manner, which has led to counterfactual explanations being introduced by AI and ML researchers. Counterfactuals are statements relating to what has not happened or is not the case, for example, “If I hadn’t taken a sip of this hot coffee, I wouldn’t have burned my tongue.” Counterfactual explanations have the advantage of being suitable for both technical and lay audiences; however, they only provide information about one choice that the model makes, so they can bias the recipient.

Kacper then described his research focus on pediatric sepsis. Sepsis is a life-threatening condition that develops from an infection and is the third leading cause of death worldwide. Pediatric sepsis specifically refers to cases occurring in children. Sepsis is a particularly elusive disease because it can manifest differently in different people, and patients respond differently to treatments, making it challenging to identify the best treatment strategy for a specific patient. Kacper hopes that AI will be able to help solve this problem in this day and age.

Importantly, the AI being applied to the pediatric sepsis problem is interactive and aims to support and work alongside humans rather than replace them. It is crucial that the AI aligns with the current clinical workflow so that it can be easily adopted into hospitals and GP practices. Kacper highlights that this is particularly important for pediatricians as they have been highly skeptical of AI in the past. However, now that AI has proven successful in adult branches of medicine, they are starting to warm to the idea.

Pediatric sepsis comes with many challenges. Pediatric sepsis has less data available than adult sepsis, and there is rapid deterioration, meaning that early diagnosis is vital. Unfortunately, there are many diseases in children that mimic the symptoms of sepsis, making it not always easy to diagnose. One of the main treatments for sepsis is antibiotics; however, since children are a vulnerable population, we don’t want to administer antibiotics unnecessarily. Currently, it is estimated that 98% of children receive antibiotics unnecessarily, which is contributing to antimicrobial resistance and can cause drug toxicity.

AI has the potential to help with these challenges; however, the goal is to augment, not disrupt, the current workflow. Humans can have great intuition and can observe cues that lead to excellent decision-making, which is particularly valuable in medicine. An experiment was carried out on nurses in neonatal care, which showed that nurses were able to correctly predict which infants were developing life-threatening infections without having any knowledge of the blood test results. Despite being able to identify the disease, the nurses were unable to explain their judgment process. The goal is to add automation from AI but still retain certain key aspects of human decision-making.

How much and where the automation should take place is not a simple question, however. You could replace biased humans with algorithms, but algorithms can also be biased, so this wouldn’t necessarily improve anything. Another option would be to have algorithms propose decisions and have humans check them; however, this still requires humans to carry out mundane tasks. Would it really be better than no automation at all? Kacper then asks: if you can prove an AI model is capable of predicting better than a human, and a human decides to use their own judgment to override the model, could it be considered malpractice?

Another proposed solution for implementing interactive AI is to have humans make the decision, with the AI model presenting arguments for and against that decision to help the human decide whether to change their mind or not.

The talk ends by discussing how interactive AI may be deployed in real-life scenarios. Since the perfect integration of AI and humans doesn’t quite exist yet, Kacper suggests that clinical trials might be a good idea, where suggestions made by AI models are marked as ‘for research only’ to keep them separated from other clinical workflows.

BIAS ’23 – Day 3: Dr Daniel Schien talk – Sustainability of AI within global carbon emissions

This blog post is written by AI CDT student Phillip Sloan

After a great presentation by Dr Dandan Zhang, Dr Daniel Schien presented a keynote on the Carbon Footprint of AI within global carbon emissions of ICT, the presentation provided a reflection on AI’s role within climate change.

The keynote started by stating the effects of climate change are becoming more noticeable. It’s understandable that we might get numb from the constant barrage of climate change reports in the news, but the threat of climate change is still present and it is one of the biggest challenges we face today. As engineers, we have a duty to reduce our impact where possible. The Intergovernmental Panel on Climate Change (IPCC) is trying to model the effects of global climate change, demonstrating many potential futures depending on how well we limit our carbon emissions. It has been agreed that we can no longer stop climate change, and the focus has changed to trying to limit its effect, with an aim to have a global temperate increase of 2 degrees. The IPCC has modelled the impact until 2100, across various regions and modelling a range of impact areas.

Currently the global emissions are approximately 50 gigatonnes of equivalent carbon dioxide (GtCO2e), which needs to be reduced significantly. This is the total consumption, including sections such as energy production, agriculture and general industry. Many governments have legislated carbon consumption. Introducing CO2 emission standards for cars and vans, renewable energy directives, land use, and forestry regulation. The main goal is a 50% reduction in carbon emissions until 2030.

ICTs share of global green house gas (GHG) emissions is 2.3%. With data centres, where a lot of AI algorithms are run, creating a large proportion of these emissions. Do we need to worry about AI’s contribution to climate change? The keynote highlighted that 20-30% of all data centre energy consumption is related to AI, and looking at just the ChatGPT model, its energy consumption its equivalent to the consumption of 175,000 households! These figures are expected to get worse, with the success of AI causing an increase in demand, further impacting AI’s energy consumption. The keynote highlighted that the impact of AI is not just from the training and inference, but also from the construction of the data centres and equipment, such as graphics cards.

A conceptual model was presented, modelling the effects of ICT on carbon emissions. The model described three effects that ICT has on carbon consumption. These are direct effects, enabling (indirect) effects and systemic effects. Direct effects are related to the technology that is being developed , its production, use and disposal. Enabling effects are related to its application, providing induction and obsolescence effects. Systemic effects are related to behavioural and structural change from utilising these applications.

So, what can be done to reduce the environmental impact of AI? In the development of AI systems, efficiency improvements such as utilising more energy efficient models and hardware that reduces the energy consumption, and improving the carbon footprint. Using green energy is also important on your carbon footprint. Dr Schien notes that the UK has acted upon this, implementing regulation to promote wind and solar energy with a hope to decarbonise the electric grid. The average gC02e/kWh has moved from around 250 down to 50, showing the UK governments efforts to impact climate change.

Despite its significant energy consumption, AI can be used to make systems more efficient, reducing the energy consumption of other systems. For example, AI-powered applications can tell the power systems to switch to using the batteries during times when tariffs are higher (peak load shifting), or when the grid power usage reaches a certain power grid alternating current limit (AC limit).

During the Q&A, an interesting question was put forward asking at what point should sustainability be thought of? When developing a model, or further down the pipeline?

Dr Schien answered by mentioning that you should always consider which model to use. Can you avoid a deep learning model and use something simpler, like a linear regression or random forest model? You can also avoid waste in your models, reducing the number of layers or changing architectures would be useful. Generally thinking about only using what you need is an important mindset for improving your AI carbon footprint. An important note was that a lot of efficiencies are now being coded into frequently used libraries, which is helpful for development as it is now automated. Finally, seeking to work for companies that are mindful of energy consumptions and emissions will put pressure on firms to consider these to attract talented staff.

Dr Daniel Schien is a senior lecturer at the University of Bristol. His research aims are focused on improving our understanding of the environmental impact from information and communication technologies (ICT), and the reduction of such impact. We would like to thank him for his thoughtful presentation into the effect of AI with regards to climate change, and the discussions it provoked.

BIAS ’23 – Day 3: Prof. Kerstin Eder talk – (Trustworthy Systems Laboratory, University of Bristol) The AI Verification Challenge

This blog post is written by AI CDT student, Isabella Degen

A summary of Prof. Kerstin Eder’s talk on the well-established procedures and practices of verification and validation (V&V) and how they relate to AI algorithms. The objective is to inspire the readers to apply better V&V processes to their AI research. 

Verification is the process used to gain confidence in the correctness of a system compared to its requirements and specifications. Validation is the process used to assess if the system behaves as intended in its target environment. A system can verify well, meaning it does what it was specified to do, and not validate well, meaning it does not behave as intended.

V&V are challenging for systems that fully or partially involve AI algorithms despite V&V being a well-established and formalised practice. Many AI algorithms are black boxes that offer no transparency about how the algorithm operates. They respond with multiple correct answers to similar or even the same input. AI algorithms are not deterministic by design. Ideally, they can handle new situations well without needing to be trained for all situations. Therefore, accurately and exhaustively listing all the requirements against which these algorithms need to be verified is practically impossible.

V&V methods for complex robotic systems like automated vehicles are well-established. Automated vehicles need to be capable of operating in an environment where unexpected situations occur. Various ISO standards (ISO 13485 – Medical Devices Quality Management, ISO 10218-1 – Robots and Robotic Devices, ISO 12207 – Systems and Software Engineering) describe different V&V practices required for software, systems and devices. These standards expect the use of multiple processes and practices to meet the required quality. No one practice covers the extent of V&V each practice has shortcomings. The three techniques for V&V are formal verification, simulation-based verification and experiments [3]. The image below arranges these techniques by how realistic and coverable they are, where coverability refers to how much of the system a technique can analyse [1].

The image shows the framework for corroborative V&V [1].

An approach for simulation-based testing is coverage-driven verification (CDV). A two-tiered test generation approach where abstract test sequences are computed first and then concretised has been shown to achieve a high level of automation [2]. It is important to note that coverage includes code coverage, structural coverage (e.g. employing Finite State Machines) and functional coverage (including requirements and situations).

The images show the CDV process (left) and its translation to an automated vehicle scenario (right) [2].

Belief-desire-intention (BDI) agents used as models can further generate tests. These agents achieve coverage that is higher or equivalent to model-checking automata. The BDI agents can emulate the agency present in Human-Robot Interactions. However, the cost of learning a belief set has to be considered [3]. Similarly, software testing agents can be used to generate tests for simulation-based automated vehicle verification. Such an agency-directed approach is robust and efficient. It generates twice as many effective tests compared to pseudo-random test generation. Moreover, these agents are encoded to behave naturally without compromising the effectiveness of test generation [4].

The hope is that inspired by these techniques used to test robotic systems we will promote V&V to first-class citizens when designing and implementing AI algorithms. V&V for AI algorithms requires innovation and a creative combination of existing techniques like intelligent agency-based test generation. The reward will be to increase trust in AI algorithms.

References:

[1] Webster, Matt, et al. “A corroborative approach to verification and validation of human–robot teams.The International Journal of Robotics Research 39.1 (2020): 73-99. https://journals.sagepub.com/doi/full/10.1177/0278364919883338 

[2] Araiza-Illan, Dejanira, et al. “Systematic and realistic testing in simulation of control code for robots in collaborative human-robot interactions.” Towards Autonomous Robotic Systems: 17th Annual Conference, TAROS 2016, Sheffield, UK, June 26–July 1, 2016, Proceedings 17. Springer International Publishing, 2016. https://link.springer.com/chapter/10.1007/978-3-319-40379-3_3 

[3] Araiza-Illan, Dejanira, Anthony G. Pipe, and Kerstin Eder. “Model-based test generation for robotic software: Automata versus belief-desire-intention agents.arXiv preprint arXiv:1609.08439 (2016). https://arxiv.org/abs/1609.08439 

[4] Chance, Greg, et al. “An agency-directed approach to test generation for simulation-based autonomous vehicle verification.2020 IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, 2020. https://arxiv.org/abs/1912.05434 

 

 

Essai 2023 Summer School – Matt Clifford

This blog post is written by AI CDT student, Matt Clifford

ESSAI 2023 – https://essai.si/

A few of us from the CDT – Me (Matt), Jonny and Rachael attended the ESSAI summer school on the 24th -28th of July 2023. ESSAI is the first European summer school on Artificial Intelligence and was held in Ljubljana, Slovenia. There were a variety of interesting topics and classes on offer (https://essai.si/schedule/) but here I’ll share some of the classes that I attended. I’ll keep the information brief of each topic here but feel free to reach out to me if you would like to chat through any of the topics which might be useful to you or if would like to know more!

AutoMLhttps://www.automl.org/

Optimise machine learning algorithm hyperparameters and Neural architectures automatically by using various techniques (Baysian optimisation etc.) Python packages for sklearn and pytorch: https://pypi.org/project/smac/

https://github.com/automl/Auto-PyTorch

Very useful when you want a more objective training approach which will save you time, computation and more importantly frustration!

Learning Beyond Static Datasets – https://owll-lab.com/

Exploring mechanisms to help catastrophic forgetting when learning a new task in ML.

Topics related to: transfer learning, active learning, continual learning, lifelong learning, curriculum learning, open world learning, knowledge distillation.

A nice survey paper to map out the whole landscape – https://www.sciencedirect.com/science/article/pii/S089360802300014X?via%3Dihub

Uncertainty Quantification

Adding uncertainty to a model (important with neural networks being so overly confident!). Methods can either be inherent (Bayesian NN etc.) or post hoc (calibration, ensembling, Monte-Carlo dropout) and can disentangle aleatoric and epistemic uncertainty measures.

Fairness & Privacy –

https://aif360.readthedocs.io/en/latest/

https://fairlearn.org/

The president of Slovenia (plus her not so inconspicuous bodyguards) attended these talks which was a bit of a surprise!

Explored navigating the somewhat conflicting landscape of statical fairness by ensuring groups of people have the same model statistics. Picking which statistics, however, not so easy and it’s impossible to ensure all statistics match in real life scenarios – https://arxiv.org/pdf/2304.06057.pdf .

Also looked at privacy through anonymity (K-anonymity, L-diversity, T-closeness) and differential privacy. I won’t go into details but thought I’d mention some of the main techniques currently used in academic and industry.

Again, let me know if you want to go into the details of anything that is useful or interesting to you!

Also, a side note, Slovenia is an amazingly beautiful country, and I can very much recommend to anyone thinking of going! Here’s a few photos:

 

AI UK 2023 Conference – Rachael Laidlaw

This blog post is written by AI CDT student, Rachael Laidlaw

Last month, I took the exciting opportunity to attend AI UK 2023, a large-scale event organised by The Alan Turing Institute. It was my first conference outside of Bristol, held in the heart of London at the Queen Elizabeth II Centre – right by Westminster Abbey and Big Ben – and it promised to offer a diverse programme of activities with a broad range of interactive content. As such, the sessions were packed with novel material delivered by leading international thinkers across multiple disciplines, resulting in an in-depth exploration of how data science and AI can be used to solve real-world challenges.

On the day

After a short walk to the venue from my hotel in Piccadilly Circus, I signed in and collected my demonstrator lanyard before heading up to the third floor of the building to meet my colleagues from the Jean Golding Institute. We would be spending the day manning a stall for the Local Air initiative in the environmental section of the Fleming room, engaging with attendees from both academia and industry about a pollution monitoring system designed to be mounted on e-scooters.

Highlights included:

  • using ground coffee to simulate particulate matter in the air and generate a live response from the prototype which was shown on the screen behind us,
  • contemplating alternative applications for the noise-pollution sound sensors (i.e., for use in the study of bats) with representatives from the UK Centre for Ecology and Hydrology, and
  • considering media coverage possibilities for the project with a journalist from the Financial Times.

Into the afternoon

When lunchtime arrived, I began circling the floor to visit the other stalls. Whilst wandering, I encountered displays of lots of innovative concepts, some of my favourites being:

  • a family of domestic social robot pets developed by the company Konpanion to alleviate loneliness,
  • progress on the tool BoneFinder, created by academics at University of Manchester for use in clinical practice to segment skeletal structures,
  • a cardiac digital twin produced at King’s College London,
  • SketchX’s headset that gives you the ability to build your own metaverse from rough virtual drawings, and
  • the Data Hazards project, complete with holographic stickers and hi-vis jackets worn by another University of Bristol team to really bring data-oriented risk assessments to life.

Of the above, BoneFinder stood out to me in particular, owing to the fact that my current specialist focus is ecological computer vision, and, thus, seeing the same sort of technique being used for a medical application piqued my interest.

The talks

During a quiet period at the stall, I jumped at the chance of sitting in on a very well-attended talk by Gary Marcus from NYU on the power of ChatGPT and the unknowns surrounding the future of such pieces of technology. This was especially thought-provoking and relevant to my ongoing work towards a potential CHI publication.

After re-energising with some delicious cookies in the break, I also made it to an insightful panel discussion on shaping public perceptions of artificial intelligence, featuring Tracey Brown (the director of Sense About Science), Tania Duarte (the co-founder and CEO of We and AI) and David Leslie (a specialist in ethics and responsible innovation). This reminded me of the importance of keeping stakeholders in mind during all stages of research.

Closing moments

To round off the day, everyone came together to mingle and expand their networks over canapés and a significant amount of complimentary wine. We then gathered our belongings and headed out for dinner and to be tourists in London for the evening.

All in all, it was an incredibly fun and informative experience alongside a great team, and I’m already looking forward to future conferences!

2023 AAAI Conference Blog – Amarpal Sahota

This blog post is written by AI CDT Student Amarpal Sahota

I attended the 37th AAAI Conference on Artificial Intelligence from the 7th of February 2023 to the 14th February. This was my first in person conference and I was excited to travel to Washington D.C.

The conference schedule included Labs and Tutorials February 7th – 8th , the main conference February 9th – 12th followed by the workshops on February 13th – 14th.

Arriving and Labs / Tutorials

I arrived at the conference venue on 7th February to sign in and collect my name badge. The conference venue (Walter E. Washington Convention Center) was huge and had within it everything you could need from areas to work or relax to restaurants and of course many halls / lecture theatres to host talks.

I was attending the conference to present a paper at the Health Intelligence Workshop. Two of my colleagues from the University of Bristol (Jeff and Enrico) were also attending to present at this workshop (we are pictured together below!).

The tutorials were an opportunity to learn from experts on topics that you may not be familiar with yourself. I attended tutorials on Machine Learning for Causal Inference, Graph Neural Networks and AI for epidemiological forecasting.

The AI for epidemiological forecasting tutorial was particularly engaging. The speakers were very good at giving an overview of historical epidemiological forecasting methods and recent AI methods used for forecasting before introducing state of the art AI methods that use machine learning combined with our knowledge of epidemiology. If you are interested, the materials for this tutorial can be accessed at : https://github.com/AdityaLab/aaai-23-ai4epi-tutorial .

Main conference Feb  9th – Feb 12th

The main conference began with a welcome talk in the ‘ball room’. The room was set up with a stage and enough chairs to seat thousands. The welcome talk introduced included an overview of the different tracks within the conference (AAAI Conference of AI, Innovative Application of AI, Educational Advances in AI) , statistics around conference participation / acceptance and introduced the conference chairs.

The schedule for the main conference each day included invited talks and technical talks running from 8:30 am to 6pm. Each day this would be followed by a poster session from 6pm – 8pm allowing us to talk and engage with researchers in more detail.

For the technical talks I attended a variety of sessions from Brain Modelling to ML for Time-Series / Data Streams and Graph-based Machine Learning. Noticeably, all of the sessions were not in person. They were hybrid, with some speakers presenting online. This was disappointing but understandable given visa restrictions for travel to the U.S.

I found that many of the technical talks became difficult to follow very quickly with these talks largely aimed at experts in the respective fields. I particularly enjoyed some of the time-series talks as these relate to my area of research. I also enjoyed the poster sessions that allowed us to talk with fellow researchers in a more relaxed environment and ask questions directly to understand their work.

For example, I enjoyed the talk ‘SVP-T: A Shape-Level Variable-Position Transformer for Multivariate Time Series Classification‘ by PhD researcher Rundong Zhuo. At the poster session I was able to follow up with Rundong to ask more questions and understand his research in detail.  We are pictured together below!

Workshops Feb 13th – 14th

I attended the 7th International Workshop On Health Intelligence from 13th to 14th February. The workshop began with opening remarks from the Co-chair Martin Michalowski before a talk by our first keynote speaker. This was Professor Randi Foraker who  spoke about her research relating to building trust in AI for Improving Health Outcomes.

This talk was followed by paper presentations with papers on related topics grouped into sessions. My talk was in the second session of the day titled ‘Classification’. My paper (pre-print here) is titled ‘A Time Series Approach to Parkinson’s Disease Classification from EEG’. The presentation went reasonably smoothly and I had a number of interesting questions from the audience about  applications of my work and the methods I had used. I am pictured giving the talk below!

The second half of the day focused on the hackathon. The theme of the hackathon was biological age prediction. Biological ageing is a latent concept with no agreed upon method for estimation. Biological age tries to capture a sense of how much you have aged in the time you have been alive. Certain factors such as stress and poor diet can be expected to age individuals faster. Therefore two people of the same chronological age may have different biological ages.

The hackathon opened with a talk on biological age prediction by Morgan Levin (The founding Principal Investigator at Altos Labs). Our team for the hackathon included four people from the University of Bristol – myself , Jeff , Enrico and Maha. Jeff (pictured below) gave the presentation for our team. We would have to wait until the second day of the conference to find out if we won one of the three prizes.

The second day of the workshop consisted of further research talks, a poster session and an awards ceremony in the afternoon. We were happy to be awarded the 3rd place prize of $250 for the hackathon! The final day concluded at around 5pm. I said my good byes and headed to Washington D.C. airport for my flight back to the U.K