Highlights from NeurIPS 2022 and the 2nd Interactive Learning for NLP Workshop – Dr Edwin Simpson

This blog post is written by lecturer in Computer Science, Dr Edwin Simpson

In November I was lucky enough to attend NeurIPS 2022 in person in New Orleans, and take part as a co-organiser of InterNLP, our second interactive learning for NLP workshop. I had many interesting discussions around posters, talks and coffee breaks and took loads of photos of posters. It was hard to write up my highlights and without the post becoming endlessly long, so here is my attempt to pick out a handful of papers that caught my eye and tell you a little bit about how our workshop unfolded.

Main Conference

One topic generating a lot of buzz was in-context learning, where language models learn to perform new tasks without updating their weights from examples given in the model’s input prompt. Models like GPT3 can perform in-context learning from small numbers of examples. Garg et al. presented an interesting paper that triez to understand what classes of functions can be learned in this way [1]. They were able to train Transformers that learn function classes including linear functions and two-layer neural networks.


However, for few-shot learning, in-context learning may not be the best solution: Liu et al. [2] showed that fine-tuning a model by introducing a small number of additional weights can be cheaper and produce more accurate models.



Another interesting NLP paper from Jian, Gao and Vosoughi [3] learns sentence embeddings usingimage and audio data alongside a text training set. The method works by creating pairs of images (or audio) using data augmentation, which are then embedded and fed through a BERT-like transformer to provide additional data for contrastive learning. This is especially useful for low-resource languages and domains, and it is really interesting that we can learn from different modalities without any parallel examples.

Many machine learning researchers are concerned with models that produce well-calibrated probabilities, but what difference does calibration make to end users? Vodrahalli, Gerstenberg and Zou [4] investigated a binary prediction task in which a classifier provides advice ta user, along with its confidence. They found that exaggerating the model’s confidence led the user to perform better. So, the classifier was uncalibrated and had higher training loss but the complete human-AI system was more effective, which shows how important it is for ML researchers to consider real-world use cases for their models.

Sticking with the topic of uncertainty, Bayesian deep learning aims to quantify uncertainty in complex neural network models, but is challenging to apply as it is difficult to specify a suitable prior distribution. Ideally, we’d specify a prior over the functions that the network encodes, rather than over individual network weights. Tran et al. [4] introduce a method for setting functional priors in Bayesian neural networks, by aligning them with Gaussian processes. It will be interesting to try out their approach in some deep learning applications where quantifying uncertainty is important.

At the poster sessions, I also enjoyed learning about the wide range of new benchmarks and datasets that will enable lots of exciting future work. For example, one that relates to my own work that I’d like to make use of is BIGBIO [5], which makes a number of biomedical NLP datasets more accessible and will hopefully to more reproducible results.

Juho Kim, who is associate professor at Korea Advanced Institute of Science and Technology (KAIST), gave a keynote on his vision of Interaction-Centric AI. He called on AI researchers to move beyond data-centric or model-centric research by rethinking the complete AI research process around the user experience of AI. Juho’s talk gave examples of how an interaction-centric approach may affect the way we evaluate models, which cases we focus on when trying to improve accuracy, how to incentivise users to engage with AI, and several other aspects of interaction-centric AI that his lab has been working on. He demonstrated Stylette, a tool that lets you use natural language to change the appearance of a website. The keynote ended with a call to action for AI researchers to rethink performance metrics, the design process and collaboration, particularly with HCI researchers.

Geoff Hinton appeared remotely from home to present the Forward-Forward algorithm, a method for training neural networks without backpropagation that could give insights into how learning in the cortex takes place. His experiments showed some promising early results, and in the Q&A Geoff talked about coding the experiments himself. A preliminary arXiv paper is now out [6].

1. Garg et al., What Can Transformers Learn In-Context? A Case Study of Simple Function Classes, https://arxiv.org/abs/2208.01066

2. Liu et al., Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning, https://arxiv.org/abs/2205.05638

3. Jian, Gao and Vosoughi, Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings, https://arxiv.org/pdf/2209.09433.pdf

4. Vodrahalli, Gerstenberg and Zou, Uncalibrated Models Can Improve Human-AI Collaboration, https://arxiv.org/abs/2202.05983

5. Fries et al., BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing, https://arxiv.org/abs/2206.15076

6. Hinton, The Forward-Forward Algorithm: Some Preliminary Investigations, https://arxiv.org/abs/2212.13345

InterNLP Workshop

2022 was our second edition of the InterNLP workshop, and we were very happy that the community grew, this year with 20 accepted papers and a chance to meet in person!  Some of the videos are on youtube at https://www.youtube.com/@InterNLP. Others will hopefully be available soon on the NeurIPS archives

The programme was packed with impressive invited talks from Karthik Narasimhan (Princeton), John Langford (Microsoft), Dan Weld (UWashington), Anca Dragan (UCBerkeley) and Aida Nematzadeh (DeepMind). To pick out just a couple, Karthik presented recent work on semantic supervision [1] for few-shot generalization and personalization, which learns from semantic descriptions of classes, providing a way for instruct models through text. Anca Dragan talked about interactive agents that go beyond following instructions about how exactly to perform a task, to inferring the user’s goals, preferences, and constraints. She emphasized that the way people refer to desired actions provides important information about their preferences, and therefore we can infer, from a user’s language, reward functions that reflect their preferences. Aida Nematzadeh compared self-supervised pretraining to language learning in childhood, which involves interacting with other people. Her talk focused on the evaluation of neural representations, and she called for real-world evaluations, strong baselines and probing to provide a much more thorough way of uncovering the strengths and weaknesses of pretrained models.

The contributed talks and posters showcased a wide range of work from human-in-the-loop learning techniques to software libraries and benchmark datasets. For example, PyTAIL [2] is a Python library for active learning that collects new labelling rules and customizes lexicons as well as collecting labels. Mohanty et al. [3] developed the IGLU challenge, in which an agent has to perform tasks by following natural language instructions; their presentation at InterNLP explained how they collected the data. The RL4M library [4] provides a way to optimize language generation models using reinforcement learning, as a way to adapt to human preferences; the paper [4] also presents a benchmark, GRUE, for evaluating RL methods for language generation. Majumder and McAuley [5] investigate the use of explanations to debias NLP models while maintaining a good trade-off between predictive performance and bias mitigation.





At the end of the day, I got to ask a lot of questions to some very smart people during our panel discussion – thanks to John Langford, Karthik Narasimhan, Aida Nematzadeh, and Alane Suhr for taking part, and thanks to the audience for some great interactions too. The wide-ranging discussion touched on the evaluation of interactive systems (how to use static data for evaluation, evaluating how well models adapt to user input), working with researchers and users from other fields, different forms of interaction besides language, and challenges that are specific to interactive NLP.

We plan to be back at a future conference (not sure which one yet!) for the next iteration of InterNLP. Large language models and in-context learning are clearly revolutionizing this space in some ways, but I’m convinced we still have a lot of work to do to design interactive machine learning systems that are accountable, reliable, and require fewer resources.

Thank you to Nguyễn Xuân Khánh for letting us include his InterNLP workshop photos.

1. Aggarwal, Deshpande and Narasimhan, SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification, https://arxiv.org/pdf/2301.11309.pdf

2. Mishra and Diesner, PyTAIL: Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data, https://internlp.github.io/documents/2022/papers/24.pdf

3. Mohanty et al., Collecting Interactive Multi-modal Datasets for Grounded Language Understanding, https://internlp.github.io/documents/2022/papers/17.pdf

4. Ramamurthy et al., Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization, https://arxiv.org/abs/2210.01241

5. Majumder and McAuley, InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions, https://arxiv.org/abs/2210.07440

CDT Research Showcase Day 2 – 31 March 2022

This blog post is written by CDT Student Matt Clifford

The second day of the research showcase focused on the future of interactive AI. This, of course, is a challenging task to predict, so the day was spent highlighting three key areas: AI in green/sustainable technologies, AI in education and AI in creativity.

Addressing each of the three areas, we were given introductory talks from industry/academia.

AI in green/sustainable technologies, Dr. Henk Muller, XMOS

Henk is CTO of Bristol based micro chip designers XMOS. XMOS’s vision is to provide low power solutions that enable AI to be deployed onto edge systems rather than being cloud based.

Edge devices benefit from lower latency and cost as well as facilitating a more private system since all computation is executed locally. However, edge devices have limited power and memory capabilities. This restricts the complexity of models that can be used. Models have to be either reduced in size or precision to conform to the compute requirements. For me, I see this as a positive for model design and implementation. Many machine learning engineers quote Occam’s razor as a philosophical pillar to design. But in practice it is far too tempting to throw power-hungry supercomputer resources at problems where perhaps they aren’t needed.

It’s refreshing to see the type of constraints that XMOS’s chips present us with opening the doors for green and sustainable AI research and innovation in a way that many other hardware manufacturers don’t encourage.

AI in Education, Dr. Niall Twomey, Kidsloop

Niall Twomey, AI in Education talk
Niall Twomey, KidsLoop, giving the AI in Education talk

AI for/in/with education helps teachers by providing the potential for personalised assistants in a classroom environment. They would give aid to students when the teacher’s focus and attention is elsewhere.

The most recent work from kidsloop addresses the needs of neurodivergent students, concentrating on making learning more appropriate to innate ability rather than neurotypical standards. There is potential for the AI in education to reduce biases towards neurotypical students in the education system, with a more dynamic method of teaching that scales well to larger classroom sizes. I think that these prospects are crucial in the battle to reduce stigma and overcome challenges associated with neurodivergent students.

You can find the details of the methods used in their paper: Equitable Ability Estimation in Neurodivergent Student Populations with Zero-Inflated Learner Models, Niall Twomey et al., 2022. https://arxiv.org/abs/2203.10170

It’s worth mentioning that kidsloop will be looking for a research intern soon. So, if you are interested in this exciting area of AI then keep your eyes peeled for the announcements.

AI in Creativity, Prof. Atau Tanaka, University of Bristol

Atau Tanaka, AI and Creativity talk, with Peter Flach leading the Q&A session
Atau Tanaka giving the AI and Creativity talk, with Peter Flach leading the Q&A session

The third and final topic of the day was Ai in a creative environment, specifically for music. Atau showcased an instrument he designed which uses electrical signals produced by the body’s muscles to capture a person’s gesture as the input. He assigns each gesture input to a corresponding sound. From here a regression model is fitted, enabling the interpolation between each gesture. This allows novel sounds to be synthesised with new gestures. The sounds themselves are experimental, dissonant, and distant from the original input sounds, yet Atau seems to have control and intent over the whole process.

The interactive ML training process Atau uses glimpses at the tangibility of ML that we rarely get to experiment with. I would love to see an active learning style component to the learning algorithm that would solidify the human and machine interaction further.

Creativity and technology are intertwined at their core and  I am always excited to see how emerging technologies can influence creativity and how creatives find ways to redefine creativity with technology.

Breakout Groups and Plenary Discussion

Discussion groups
Discussion groups during the Research Showcase

After lunch we split into three groups to share thoughts on our favourite topic area. It was great to share opinions and motivations amongst one another. The overall drive for discussion was to flesh out a rough idea that could be taken forward as a research project with motivations, goals, deliverables etc. A great exercise for us first years to undertake before we enter the research phase of the CDT!

Closing Thoughts

I look forward to having more of these workshop sessions in the future as the restrictions of the covid pandemic ease. I personally find them highly inspirational, and I believe that the upcoming fourth IAI CDT cohort will be able to benefit significantly from having more in person events like these workshops. I think that they will be especially beneficial for exploring, formulating and collaborating on summer project ideas, which is arguably one of the most pivotal aspects of the CDT.

CDT Research Showcase Day 1 – 30 March 2022

Blog post written by CDT Student Oli Deane.

This year’s IAI CDT Research Showcase represented the first real opportunity to bring the entire CDT together in the real world, permitting in-person talks and face-to-face meetings with industry partners.

Student Presentations

Pecha Kucha presentation given by Grant Stevens
Grant Stevens giving his Pecha Kucha talk

The day began with a series of quickfire talks from current CDT students. Presentations had a different feel this year as they followed a Pecha Kucha style; speakers had ~6 minutes to present their research with individual slides automatically progressing after 20 seconds. As a result, listeners received a whistle-stop tour of each project without delving into the nitty gritty details of research methodologies.

Indeed, this quickfire approach highlighted the sheer diversity of projects carried out in the CDT. The presented projects had a bit of everything; from a data set for analyzing great ape behaviors, to classification models that determine dementia progression from time-series data.

It was fascinating to see how students incorporated interactivity into project designs. Grant Stevens, for example, uses active learning and outlier detection methods to classify astronomical phenomena. Tashi Namgyal has developed MIDI-DRAW, an interactive musical platform that permits the curation of short musical samples with user-provided hand-drawn lines and pictures. Meanwhile, Vanessa Hanschke is collaborating with LV to explore how better ethical practices can be incorporated into the data science workflow; for example, her current work explores an ethical ‘Fire-drill’ – a framework of emergency responses to be deployed in response to the identification of problematic features in existing data-sets/procedures. This is, however, just the tip of the research iceberg and I encourage readers to check out all ongoing projects on the IAI CDT website.

Industry Partners

Gustavo Medina Vazquez's presentation, EDF Energy, with Q&A session being led by Peter Flach
Gustavo Medina Vazquez’s EDF Energy presentation with the Q&A session being led by CDT Director Peter Flach

Next, representatives from three of our industry partners presented overviews of their work and their general involvement with the CDT.

First up was Dylan Rees, a Senior Data Engineer at LV. With a data science team stationed in MVB at the University of Bristol, LV are heavily involved with the university’s research. As well as working with Vanessa to develop ethical practices in data science, they run a cross-CDT datathon in which students battle to produce optimal models for predicting fair insurance quotes. Rees emphasized that LV want responsible AI to be at the core of what they do, highlighting how insurance is a key example of how developments in transparent, and interactive, AI are crucial for the successful deployment of AI technologies. Rees closed his talk with a call to action: the LV team are open to, and eager for, any collaboration with UoB students – whether it be to assist with data projects or act as “guinea pigs” for advancing research on responsible AI in industry.

Gustavo Vasquez from EDF Energy then discussed their work in the field and outlined some examples of past collaborations with the CDT. They are exploring how interactive AI methods can assist in the development and maintenance of green practices – for example, one ongoing project uses computer vision to identify faults in wind turbines. EDF previously collaborated with members of the CDT 2019 cohort as they worked on an interactive search-based mini project.

Finally, Dr. Claire Taylor, a representative from QINETIQ, highlighted how interactive approaches are a major focus of much of their research. QINETIC develop AI-driven technologies in a diverse range of sectors: from defense to law enforcement,  aviation to financial services. Dr. Taylor discussed the changing trends in AI, outlining how previously fashionable methods that have lost focus in recent years are making a come-back courtesy of the AI world’s recognition that we need more interpretable, and less compute-intensive, solutions. QINETIQ also sponsor Kevin Flannagan’s (CDT 2020 cohort) PhD project in which he explores the intersection between language and vision, creating models which ground words and sentences within corresponding videos.

Academic Partners and Poster Session

Research Showcase poster session
Research Showcase poster session

To close out the day’s presentations, our academic partners discussed their relevant research. Dr. Oliver Ray first spoke of his work in Inductive Logic Programming before Dr. Paul Marshall gave a perspective from the world of human computer interaction, outlining a collaborative cross-discipline project that developed user-focused technologies for the healthcare sector.

Finally, a poster session rounded off proceedings; a studious buzz filled the conference hall as partners, students and lecturers alike discussed ongoing projects, questioning existing methods and brainstorming potential future directions.

In all, this was a fantastic day of talks, demonstrations, and general AI chat. It was an exciting opportunity to discuss real research with industry partners and I’m sure it has produced fruitful collaborations.

I would like to end this post with a special thank you to Peter Relph and Nikki Horrobin who will be leaving the CDT for bigger and better things. We thank them for their relentless and frankly spectacular efforts in organizing CDT events and responding to students’ concerns and questions. You will both be sorely missed, and we all wish you the very best of luck with your future endeavors!

BIAS Day 1 Review: ‘Interactive AI’

This review of the 2nd day of the BIAS event, ‘Interactive AI’, is written by CDT Student Vanessa Hanschke

The Bristol Interactive AI Summer School (BIAS) was opened with the topic of ‘Interactive AI’, congruent with the name of the hosting CDT. Three speakers presented three different perspectives on human-AI interactions.

Dr. Martin Porcheron from Swansea University started with his talk “Studying Voice Interfaces in the Home”, looking at AI in one of the most private of all everyday contexts: smart speakers placed in family homes. Using an ethnomethodological approach, Dr. Porcheron and his collaborators recorded and analysed snippets of family interactions with an Amazon Echo. They used a purpose-built device to record conversations before and after activating Alexa. While revealing how the interactions with the conversational AI were embedded in the life of the home, this talk was a great reminder of how messy real life may be compared to the clean input-output expectations AI research can sometimes set. The study was also a good example of the challenge of designing research in a personal space, while respecting the privacy of the research subjects.

Taking a more industrial view of human-AI interactions, Dr Alison Smith-Renner from Dataminr followed with her talk “Designing for the Human-in-the-Loop: Transparency and Control in Interactive ML”. How can people collaborate with an ML (Machine Learning) model to achieve the best outcome? Dr. Smith-Renner used topic modelling to understand the human-in-the-loop problem with respect to these two aspects: (1) Transparency: methods for explaining ML models and their results to humans. And (2) Control: how users can provide feedback to systems. In her work, she looks at how users are affected by the different ways ML can apply their feedback and if model updates are consistent with the behaviour the users anticipate. I found particularly interesting the different expectations the participants of her study had of ML and how the users’ topic expertise influenced how much control they wanted over a model.

Finally, Prof. Ben Shneiderman from the University of Maryland concluded with his session titled “Human-Centered AI: A New Synthesis” giving a broader view on where AI should be heading by building a bridge between HCI (Human-Computer Interaction) and AI. For the question of how AI can be built in a way that enhances people, Prof. Shneiderman presented three answers: the HCAI framework, design metaphors and governance structures, which are featured in his recently published book. Hinting towards day 4’s topic of responsible AI, Prof. Shneiderman drew a compelling comparison between safety in the automobile industry and responsible AI. While often unlimited innovation is used as an excuse for a deregulated industry, regulation demanding higher safety in cars led to an explosion of innovation of safety belts and air bags that the automobile industry is now proud of. The same can be observed for the right to explainability in GDPR and the ensuing innovation in explainable AI. At the end of the talk, Prof. Shneiderman called to AI researchers to create a future that serves humans and is sustainable and “makes the world warmer and kinder”.

It was an inspiring afternoon for anyone interested in the intersection of humans and AI, especially for researchers like me trying to understand how we should design interfaces and interactions, so that we can gain the most benefit as humans from powerful AI systems

Neglected Aspects of the COVID-19 pandemic

This week’s post is written by IAI CDT student Gavin Leech.
I recently worked on two papers looking at neglected aspects of the COVID-19 pandemic. I learned more than I wanted to know about epidemiology.

The first: how much do masks do?

There were a lot of confusing results about masks last year.
We know that proper masks worn properly protect people in hospitals, but zooming out and looking at the population effect led to very different results, from basically nothing to a huge halving of cases.
Two problems: these were, of course, observational studies, since we don’t run experiments on the scale of millions. (Or not intentionally anyway.) So there’s always a risk of missing some key factor and inferring the completely wrong thing.
And there wasn’t much data on the number of people actually wearing masks, so we tended to use the timing of governments making it mandatory to wear masks, assuming that this caused the transition to wearing behaviour.
It turns out that the last assumption is mostly false: across the world, people started to wear masks before governments told them to. (There are exceptions, like Germany.) The correlation between mandates and wearing was about 0.32. So mask mandate data provide weak evidence about the effects of mass mask-wearing, and past results are in question.
We use self-reported mask-wearing instead: the largest survey of mask wearing (n=20 million, stratified random sampling) and obtain our effect estimates from 92 regions across 6 continents. We use the same model to infer the effect of government mandates to wear masks and the effect of self-reported wearing. We do this by linking confirmed case numbers to the level of wearing or the presence of a government mandate. This is Bayesian (using past estimates as a starting point) and hierarchical (composed of per-region submodels).
For an entire population wearing masks, we infer a 25% [6%, 43%] reduction in R, the “reproduction number” or number of new cases per case (B).
In summer last year, given self-reported wearing levels around 83% of the population, this cashed out into a 21% [3%, 23%] reduction in transmission due to masks (C).
One thing which marks us out is being obsessive about checking this is robust; that different plausible model assumptions don’t change the result. We test 123 different assumptions about the nature of the virus, of the epidemic monitoring, and about the way that masks work. It’s heartening to see that our results don’t change much (D)
It was an honour to work on this with amazing epidemiologists and computer scientists. But I’m looking forward to thinking about AI again, just as we look forward to hearing the word “COVID” for the last time.

The second: how much does winter do?

We also look at seasonality: the annual cycle in virus potency. One bitter argument you heard a lot in 2020 was about whether we’d need lockdown in the summer, since you expect respiratory infections to fall a lot in the middle months.

We note that the important models of what works against COVID fail to account for this. We look at the dense causal web involved:

This is a nasty inference task, and data is lacking for most links. So instead, we try to directly infer a single seasonality variable.
It looks like COVID spreads 42% less [25% – 53%, 95% CI] from the peak of winter to the peak of summer.
Adding this variable improves two of the cutting-edge models of policy effects (as judged by correcting bias in their noise terms).
One interesting side-result: we infer the peak of winter, we don’t hard-code it. (We set it to the day with the most inferred spread.) And this turns out to be the 1st January! This is probably coincidence, but the Gregorian calendar we use was also learned from data (astronomical data)…
See also
  • Gavin Leech, Charlie Rogers-Smith, Jonas B. Sandbrink, Benedict Snodin, Robert Zinkov, Benjamin Rader, John S. Brownstein, Yarin Gal, Samir Bhatt, Mrinank Sharma, Sören Mindermann, Jan M. Brauner, Laurence Aitchison

Seasonal variation in SARS-CoV-2 transmission in temperate climates

  • Tomas Gavenciak, Joshua Teperowski Monrad, Gavin Leech, Mrinank Sharma, Soren Mindermann, Jan Markus Brauner, Samir Bhatt, Jan Kulveit
  • Mrinank Sharma, Sören Mindermann, Charlie Rogers-Smith, Gavin Leech, Benedict Snodin, Janvi Ahuja, Jonas B. Sandbrink, Joshua Teperowski Monrad, George Altman, Gurpreet Dhaliwal, Lukas Finnveden, Alexander John Norman, Sebastian B. Oehm, Julia Fabienne Sandkühler, Laurence Aitchison, Tomáš Gavenčiak, Thomas Mellan, Jan Kulveit, Leonid Chindelevitch, Seth Flaxman, Yarin Gal, Swapnil Mishra, Samir Bhatt & Jan Markus Brauner

BIAS Day 4 Review: ‘Data-Driven AI’

This review of the 4th day of the BIAS event, ‘Data-Driven AI’, is written by CDT Student Stoil Ganev.

The main focus for the final day of BIAS was Data-Driven AI. Out of the 4 pillars of the Interactive AI CDT, the Data-Driven aspect tends to have a more “applied” flavour compared to the rest. This is due to a variety of reasons but most of them can be summed up in the statement that Data-Driven AI is the AI of the present. Most deployed AI algorithms and systems are structured around the idea of data X going in and prediction Y coming out. This paradigm is popular because it easily fits into modern computer system architectures. For all of their complexity, modern at-scale computer systems generally function like data pipelines. One part takes in a portion of data, transforms it and passes it on to another part of the system to perform its own type of transformation. We can see that, in this kind of architecture, a simple “X goes in, Y comes out” AI is easy to integrate, since it will be no different from any other component. Additionally, data is a resource that most organisations have in abundance. Every sensor reading, user interaction or system to system communication can be easily tracked, recorded and compiled into usable chunks of data. In fact, for accountability and transparency reasons, organisations are often required to record and track a lot of this data. As a result, most organisations are left with massive repositories of data, which they are not able to fully utilise. This is why Data-Driven AI is often relied on as a straight forward, low cost solution for capitalising on these massive stores of data. This “applied” aspect of Data-Driven AI was very much present in the talks given at the last day of BIAS. Compared to the other days, the talks of the final day reflected some practical considerations in regards to AI.

The first talk was given by Professor Robert Jenssen from The Arctic University of Norway. It focused on work he had done with his students on automated monitoring of electrical power lines. More specifically how to utilise unmanned aerial vehicles (UAVs) to automatically discover anomalies in the power grid. A point he made in the talk was that the amount of time they spent on engineering efforts was several times larger than the amount spent on novel research. There was no off the shelf product they could use or adapt, so their system had to be written mostly from scratch. In general, this seems to be a pattern with AI systems where even, if the same model is utilised, the resulting system ends up extremely tailored to its own problem and cannot be easily reused for a different problem. They ran into a similar problem with the data set, as well. Given that the problem of monitoring power lines is rather niche, there was no directly applicable data set they could rely on. I found their solution to this problem to be quite clever in its simplicity. Since gathering real world data is rather difficult, they opted to simulate their data set. They used 3D modelling software to replicate the environment of the power lines. Given that most power masts sit in the middle of fields, that environment is easy to simulate. For more complicated problems such as autonomous driving, this simulation approach is not feasible. It is impossible to properly simulate human behaviour, which the AI would need to model, and there is a large variety in urban settings as well. However, for a mast sitting in a field, you can capture most of the variety by changing the texture of the ground. Additionally, this approach has advantages over real world data as well. There are types of anomalies that are so rare that they might simply not be captured by the data gathering process or be too rare for the model to notice them. However, in simulation, it is easy to introduce any type of anomaly and ensure it has proper representation in the data set. In terms of the architecture of the system, they opted to structure it as a pipeline of sub-tasks. There are separate models for component detection, anomaly detection, etc. This piecewise approach is very sensible given that most anomalies are most likely independent of each other. Additionally, the more specific a problem is, the easier and faster it is to train a model for it. However, this approach tends to have larger engineering overheads. Due to the larger amount of components, proper communication and synchronisation between them needs to be ensured and is not a given. Also, depending on the length of the pipeline, it might become difficult to ensure that it perform fast enough. In general I think that the work Professor Jenssen and his students did in this project is very much representative of what deploying AI systems in the real world is like. Often your problem is so niche that there are no readily available solutions or data sets, so a majority of the work has to be done from scratch. Additionally, even if there is limited or even no need for novel AI research, a problem might still require large amounts of engineering efforts to solve.

The second talk of the day was given by Jonas Pfeiffer, a PhD student from the Technical University of Darmstadt. In this talk he introduced us to his research on Adapters for Transformer models. Adapters are a light weight and faster approach to fine tuning Transformer models to different tasks. The idea is rather simple, the Adapters are small layers that are added between the Transformer layers, which are trained during fine tuning, while keeping the transformer layers fixed. While pretty simple and straight forward, this approach appears to be rather effective. However, other than focusing on his research on Adapters, Jonas is also one of the main contributors to AdapterHub.ml, a framework for training and sharing Adapters. This brings our focus to an important part of what is necessary to get AI research out of the papers and into the real world – creating accessible and easy to use programming libraries. We as researchers often neglect this step or consider it to be beyond our responsibilities. That is not without sensible reasons. A programming library is not just the code it contains. It requires training materials for new users, tracking of bugs and feature requests, maintaining and following a development road map, managing integrations with other libraries that are dependencies or dependers, etc. All of these aspects require significant efforts by the maintainers of the library. Efforts that do not contribute to research output and consequently do not contribute to the criteria by which we are judged as successful scientists. As such, it is always a delight when you see a researcher willing to go this extra mile, to make his or her research more accessible. The talk by Jonas also had a tutorial section where he led us though the process of fine tuning an off the shelf pre-trained Transformer. This tutorial was delivered through Jupyter notebooks easily accessible from the projects website. Within minutes we had our own working examples, for us to dissect and experiment with. Given that Adapters and the AdapterHub.ml framework are very recent innovations, the amount and the quality of documentation and training resources within this project is highly impressive. Adapters and the AdapterHub.ml framework are excellent tools that, I believe, will be useful to me in the future. As such, I am very pleased to have attended this talk and to have discovered these tools though it.

The final day of BIAS was an excellent wrap up to the summer school. With its more applied focus, it showed us how the research we are conducting can be translated to the real world and how it can have an impact. We got a flavour of both, what it is like to develop and deploy an AI system, and what it is like to provide a programming library for our developed methods. These are all aspects of our research that we often neglect or overlook. Thus, this day served as great reminder that our research is not something confined within a lab but that it is work that lives and breathes within the context of the world that surrounds us.

BIAS Day 3 Review: ‘Responsible AI’

This review of the 3rd day of the BIAS event, ‘Responsible AI’, is written by CDT Student Emily Vosper. 

Monday was met with a swift 9:30am start, made easier to digest with a talk on AI and Ethics, why all the fuss? By Toby Walsh. This talk, and subsequent discussion, covered the thought-provoking topic of fairness within AI. The main lesson considered whether we actually need new ethical principles to govern AI, or whether we can take inspiration from well-established areas, such as medicine. Medicine works by four key principles: Beneficence, non-maleficence, autonomy and justice and AI brings some new challenges to this framework. The new challenges include autonomy, decision making and culpability. Some interesting discussions were had around reproducing historical biases when using autonomous systems, for example within the justice system such as predictive policing or parole decision making (COMPAS).

The second talk of the day was given by Nirav Ajmeri and Pradeep MuruKannaiah on ethics in sociotechnical systems. They broke down the definition of ethics as distinguishing between right and wrong which is a complex problem full of ethical dilemmas. Such dilemmas include examples such as Les Miserables where the actor steals a loaf of bread, stealing is obviously bad, but the bread is being stollen to feed a child and therefore the notion of right and wrong becomes nontrivial. Nirav and Pradeep treated ethics as a multiagent concern and values were brought in as the building blocks of ethics. Using this values-based approach the notion of right and wrong could be more easily broken down in a domain context i.e. by discovering what the main values and social norms are for a certain domain rules can be drawn up to better understand how to reach a goal within that domain. After the talk there were some thought provoking discussions surrounding how to facilitate reasoning at both the individual and the societal level, and how to satisfy values such as privacy.

In the afternoon session, Kacper Sokol ran a practical machine learning explainability session where he introduced the concept of Surrogate Explainers – explainers that are not model specific and can therefore be used in many applications. The key takeaways are that such diagnostic tools only become explainers when their properties and outputs are well understood and that explainers are not monolithic entities – they are complex with many parameters and need to be tailer made or configured for the application in hand.

The practical involved trying to break the explainer. The idea was to move the meaningful splits of the explainer so that they were impure, i.e. they contain many different classes from the black box model predictions. Moving the splits means the explainer doesn’t capture the black box model as well, as a mixture of points from several class predictions have been introduced to the explainer. Based on these insights it would be possible to manipulate the explainer with very impure hyper rectangles. We found this was even more likely with the logistical regression model as it has diagonal decision boundaries, while the explainer has horizontal and vertical meaningful splits.