j.crabbe – Page 2 – Interactive AI CDT Blog

BIAS ’23 – Day 3: Prof. Kerstin Eder talk – (Trustworthy Systems Laboratory, University of Bristol) The AI Verification Challenge

Posted on September 14, 2023September 15, 2023 by j.crabbe

This blog post is written by AI CDT student, Isabella Degen

A summary of Prof. Kerstin Eder’s talk on the well-established procedures and practices of verification and validation (V&V) and how they relate to AI algorithms. The objective is to inspire the readers to apply better V&V processes to their AI research.

Verification is the process used to gain confidence in the correctness of a system compared to its requirements and specifications. Validation is the process used to assess if the system behaves as intended in its target environment. A system can verify well, meaning it does what it was specified to do, and not validate well, meaning it does not behave as intended.

V&V are challenging for systems that fully or partially involve AI algorithms despite V&V being a well-established and formalised practice. Many AI algorithms are black boxes that offer no transparency about how the algorithm operates. They respond with multiple correct answers to similar or even the same input. AI algorithms are not deterministic by design. Ideally, they can handle new situations well without needing to be trained for all situations. Therefore, accurately and exhaustively listing all the requirements against which these algorithms need to be verified is practically impossible.

V&V methods for complex robotic systems like automated vehicles are well-established. Automated vehicles need to be capable of operating in an environment where unexpected situations occur. Various ISO standards (ISO 13485 – Medical Devices Quality Management, ISO 10218-1 – Robots and Robotic Devices, ISO 12207 – Systems and Software Engineering) describe different V&V practices required for software, systems and devices. These standards expect the use of multiple processes and practices to meet the required quality. No one practice covers the extent of V&V each practice has shortcomings. The three techniques for V&V are formal verification, simulation-based verification and experiments [3]. The image below arranges these techniques by how realistic and coverable they are, where coverability refers to how much of the system a technique can analyse [1].

The image shows the framework for corroborative V&V [1].

An approach for simulation-based testing is coverage-driven verification (CDV). A two-tiered test generation approach where abstract test sequences are computed first and then concretised has been shown to achieve a high level of automation [2]. It is important to note that coverage includes code coverage, structural coverage (e.g. employing Finite State Machines) and functional coverage (including requirements and situations).

The images show the CDV process (left) and its translation to an automated vehicle scenario (right) [2].

Belief-desire-intention (BDI) agents used as models can further generate tests. These agents achieve coverage that is higher or equivalent to model-checking automata. The BDI agents can emulate the agency present in Human-Robot Interactions. However, the cost of learning a belief set has to be considered [3]. Similarly, software testing agents can be used to generate tests for simulation-based automated vehicle verification. Such an agency-directed approach is robust and efficient. It generates twice as many effective tests compared to pseudo-random test generation. Moreover, these agents are encoded to behave naturally without compromising the effectiveness of test generation [4].

The hope is that inspired by these techniques used to test robotic systems we will promote V&V to first-class citizens when designing and implementing AI algorithms. V&V for AI algorithms requires innovation and a creative combination of existing techniques like intelligent agency-based test generation. The reward will be to increase trust in AI algorithms.

References:

[1] Webster, Matt, et al. “A corroborative approach to verification and validation of human–robot teams.” The International Journal of Robotics Research 39.1 (2020): 73-99. https://journals.sagepub.com/doi/full/10.1177/0278364919883338

[2] Araiza-Illan, Dejanira, et al. “Systematic and realistic testing in simulation of control code for robots in collaborative human-robot interactions.” Towards Autonomous Robotic Systems: 17th Annual Conference, TAROS 2016, Sheffield, UK, June 26–July 1, 2016, Proceedings 17. Springer International Publishing, 2016. https://link.springer.com/chapter/10.1007/978-3-319-40379-3_3

[3] Araiza-Illan, Dejanira, Anthony G. Pipe, and Kerstin Eder. “Model-based test generation for robotic software: Automata versus belief-desire-intention agents.” arXiv preprint arXiv:1609.08439 (2016). https://arxiv.org/abs/1609.08439

[4] Chance, Greg, et al. “An agency-directed approach to test generation for simulation-based autonomous vehicle verification.” 2020 IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, 2020. https://arxiv.org/abs/1912.05434

Essai 2023 Summer School – Matt Clifford

Posted on August 23, 2023September 14, 2023 by j.crabbe

This blog post is written by AI CDT student, Matt Clifford

ESSAI 2023 – https://essai.si/

A few of us from the CDT – Me (Matt), Jonny and Rachael attended the ESSAI summer school on the 24th -28th of July 2023. ESSAI is the first European summer school on Artificial Intelligence and was held in Ljubljana, Slovenia. There were a variety of interesting topics and classes on offer (https://essai.si/schedule/) but here I’ll share some of the classes that I attended. I’ll keep the information brief of each topic here but feel free to reach out to me if you would like to chat through any of the topics which might be useful to you or if would like to know more!

AutoML – https://www.automl.org/

Optimise machine learning algorithm hyperparameters and Neural architectures automatically by using various techniques (Baysian optimisation etc.) Python packages for sklearn and pytorch: https://pypi.org/project/smac/

https://github.com/automl/Auto-PyTorch

Very useful when you want a more objective training approach which will save you time, computation and more importantly frustration!

Learning Beyond Static Datasets – https://owll-lab.com/

Exploring mechanisms to help catastrophic forgetting when learning a new task in ML.

Topics related to: transfer learning, active learning, continual learning, lifelong learning, curriculum learning, open world learning, knowledge distillation.

A nice survey paper to map out the whole landscape – https://www.sciencedirect.com/science/article/pii/S089360802300014X?via%3Dihub

Uncertainty Quantification

Adding uncertainty to a model (important with neural networks being so overly confident!). Methods can either be inherent (Bayesian NN etc.) or post hoc (calibration, ensembling, Monte-Carlo dropout) and can disentangle aleatoric and epistemic uncertainty measures.

Fairness & Privacy –

https://aif360.readthedocs.io/en/latest/

https://fairlearn.org/

The president of Slovenia (plus her not so inconspicuous bodyguards) attended these talks which was a bit of a surprise!

Explored navigating the somewhat conflicting landscape of statical fairness by ensuring groups of people have the same model statistics. Picking which statistics, however, not so easy and it’s impossible to ensure all statistics match in real life scenarios – https://arxiv.org/pdf/2304.06057.pdf .

Also looked at privacy through anonymity (K-anonymity, L-diversity, T-closeness) and differential privacy. I won’t go into details but thought I’d mention some of the main techniques currently used in academic and industry.

Again, let me know if you want to go into the details of anything that is useful or interesting to you!

Also, a side note, Slovenia is an amazingly beautiful country, and I can very much recommend to anyone thinking of going! Here’s a few photos:

AI UK 2023 Conference – Rachael Laidlaw

Posted on April 20, 2023April 21, 2023 by j.crabbe

This blog post is written by AI CDT student, Rachael Laidlaw

Last month, I took the exciting opportunity to attend AI UK 2023, a large-scale event organised by The Alan Turing Institute. It was my first conference outside of Bristol, held in the heart of London at the Queen Elizabeth II Centre – right by Westminster Abbey and Big Ben – and it promised to offer a diverse programme of activities with a broad range of interactive content. As such, the sessions were packed with novel material delivered by leading international thinkers across multiple disciplines, resulting in an in-depth exploration of how data science and AI can be used to solve real-world challenges.

On the day

After a short walk to the venue from my hotel in Piccadilly Circus, I signed in and collected my demonstrator lanyard before heading up to the third floor of the building to meet my colleagues from the Jean Golding Institute. We would be spending the day manning a stall for the Local Air initiative in the environmental section of the Fleming room, engaging with attendees from both academia and industry about a pollution monitoring system designed to be mounted on e-scooters.

Highlights included:

using ground coffee to simulate particulate matter in the air and generate a live response from the prototype which was shown on the screen behind us,
contemplating alternative applications for the noise-pollution sound sensors (i.e., for use in the study of bats) with representatives from the UK Centre for Ecology and Hydrology, and
considering media coverage possibilities for the project with a journalist from the Financial Times.

Into the afternoon

When lunchtime arrived, I began circling the floor to visit the other stalls. Whilst wandering, I encountered displays of lots of innovative concepts, some of my favourites being:

a family of domestic social robot pets developed by the company Konpanion to alleviate loneliness,
progress on the tool BoneFinder, created by academics at University of Manchester for use in clinical practice to segment skeletal structures,
a cardiac digital twin produced at King’s College London,
SketchX’s headset that gives you the ability to build your own metaverse from rough virtual drawings, and
the Data Hazards project, complete with holographic stickers and hi-vis jackets worn by another University of Bristol team to really bring data-oriented risk assessments to life.

Of the above, BoneFinder stood out to me in particular, owing to the fact that my current specialist focus is ecological computer vision, and, thus, seeing the same sort of technique being used for a medical application piqued my interest.

The talks

During a quiet period at the stall, I jumped at the chance of sitting in on a very well-attended talk by Gary Marcus from NYU on the power of ChatGPT and the unknowns surrounding the future of such pieces of technology. This was especially thought-provoking and relevant to my ongoing work towards a potential CHI publication.

After re-energising with some delicious cookies in the break, I also made it to an insightful panel discussion on shaping public perceptions of artificial intelligence, featuring Tracey Brown (the director of Sense About Science), Tania Duarte (the co-founder and CEO of We and AI) and David Leslie (a specialist in ethics and responsible innovation). This reminded me of the importance of keeping stakeholders in mind during all stages of research.

Closing moments

To round off the day, everyone came together to mingle and expand their networks over canapés and a significant amount of complimentary wine. We then gathered our belongings and headed out for dinner and to be tourists in London for the evening.

All in all, it was an incredibly fun and informative experience alongside a great team, and I’m already looking forward to future conferences!

Highlights from NeurIPS 2022 and the 2nd Interactive Learning for NLP Workshop – Dr Edwin Simpson

Posted on March 21, 2023March 19, 2023 by j.crabbe

This blog post is written by lecturer in Computer Science, Dr Edwin Simpson

In November I was lucky enough to attend NeurIPS 2022 in person in New Orleans, and take part as a co-organiser of InterNLP, our second interactive learning for NLP workshop. I had many interesting discussions around posters, talks and coffee breaks and took loads of photos of posters. It was hard to write up my highlights and without the post becoming endlessly long, so here is my attempt to pick out a handful of papers that caught my eye and tell you a little bit about how our workshop unfolded.

Main Conference

One topic generating a lot of buzz was in-context learning, where language models learn to perform new tasks without updating their weights from examples given in the model’s input prompt. Models like GPT3 can perform in-context learning from small numbers of examples. Garg et al. presented an interesting paper that triez to understand what classes of functions can be learned in this way [1]. They were able to train Transformers that learn function classes including linear functions and two-layer neural networks.

However, for few-shot learning, in-context learning may not be the best solution: Liu et al. [2] showed that fine-tuning a model by introducing a small number of additional weights can be cheaper and produce more accurate models.

Another interesting NLP paper from Jian, Gao and Vosoughi [3] learns sentence embeddings usingimage and audio data alongside a text training set. The method works by creating pairs of images (or audio) using data augmentation, which are then embedded and fed through a BERT-like transformer to provide additional data for contrastive learning. This is especially useful for low-resource languages and domains, and it is really interesting that we can learn from different modalities without any parallel examples.

Many machine learning researchers are concerned with models that produce well-calibrated probabilities, but what difference does calibration make to end users? Vodrahalli, Gerstenberg and Zou [4] investigated a binary prediction task in which a classifier provides advice ta user, along with its confidence. They found that exaggerating the model’s confidence led the user to perform better. So, the classifier was uncalibrated and had higher training loss but the complete human-AI system was more effective, which shows how important it is for ML researchers to consider real-world use cases for their models.

Sticking with the topic of uncertainty, Bayesian deep learning aims to quantify uncertainty in complex neural network models, but is challenging to apply as it is difficult to specify a suitable prior distribution. Ideally, we’d specify a prior over the functions that the network encodes, rather than over individual network weights. Tran et al. [4] introduce a method for setting functional priors in Bayesian neural networks, by aligning them with Gaussian processes. It will be interesting to try out their approach in some deep learning applications where quantifying uncertainty is important.

At the poster sessions, I also enjoyed learning about the wide range of new benchmarks and datasets that will enable lots of exciting future work. For example, one that relates to my own work that I’d like to make use of is BIGBIO [5], which makes a number of biomedical NLP datasets more accessible and will hopefully to more reproducible results.

Juho Kim, who is associate professor at Korea Advanced Institute of Science and Technology (KAIST), gave a keynote on his vision of Interaction-Centric AI. He called on AI researchers to move beyond data-centric or model-centric research by rethinking the complete AI research process around the user experience of AI. Juho’s talk gave examples of how an interaction-centric approach may affect the way we evaluate models, which cases we focus on when trying to improve accuracy, how to incentivise users to engage with AI, and several other aspects of interaction-centric AI that his lab has been working on. He demonstrated Stylette, a tool that lets you use natural language to change the appearance of a website. The keynote ended with a call to action for AI researchers to rethink performance metrics, the design process and collaboration, particularly with HCI researchers.

Geoff Hinton appeared remotely from home to present the Forward-Forward algorithm, a method for training neural networks without backpropagation that could give insights into how learning in the cortex takes place. His experiments showed some promising early results, and in the Q&A Geoff talked about coding the experiments himself. A preliminary arXiv paper is now out [6].

1. Garg et al., What Can Transformers Learn In-Context? A Case Study of Simple Function Classes, https://arxiv.org/abs/2208.01066

2. Liu et al., Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning, https://arxiv.org/abs/2205.05638

3. Jian, Gao and Vosoughi, Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings, https://arxiv.org/pdf/2209.09433.pdf

4. Vodrahalli, Gerstenberg and Zou, Uncalibrated Models Can Improve Human-AI Collaboration, https://arxiv.org/abs/2202.05983

5. Fries et al., BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing, https://arxiv.org/abs/2206.15076

6. Hinton, The Forward-Forward Algorithm: Some Preliminary Investigations, https://arxiv.org/abs/2212.13345

InterNLP Workshop

2022 was our second edition of the InterNLP workshop, and we were very happy that the community grew, this year with 20 accepted papers and a chance to meet in person! Some of the videos are on youtube at https://www.youtube.com/@InterNLP. Others will hopefully be available soon on the NeurIPS archives

The programme was packed with impressive invited talks from Karthik Narasimhan (Princeton), John Langford (Microsoft), Dan Weld (UWashington), Anca Dragan (UCBerkeley) and Aida Nematzadeh (DeepMind). To pick out just a couple, Karthik presented recent work on semantic supervision [1] for few-shot generalization and personalization, which learns from semantic descriptions of classes, providing a way for instruct models through text. Anca Dragan talked about interactive agents that go beyond following instructions about how exactly to perform a task, to inferring the user’s goals, preferences, and constraints. She emphasized that the way people refer to desired actions provides important information about their preferences, and therefore we can infer, from a user’s language, reward functions that reflect their preferences. Aida Nematzadeh compared self-supervised pretraining to language learning in childhood, which involves interacting with other people. Her talk focused on the evaluation of neural representations, and she called for real-world evaluations, strong baselines and probing to provide a much more thorough way of uncovering the strengths and weaknesses of pretrained models.

The contributed talks and posters showcased a wide range of work from human-in-the-loop learning techniques to software libraries and benchmark datasets. For example, PyTAIL [2] is a Python library for active learning that collects new labelling rules and customizes lexicons as well as collecting labels. Mohanty et al. [3] developed the IGLU challenge, in which an agent has to perform tasks by following natural language instructions; their presentation at InterNLP explained how they collected the data. The RL4M library [4] provides a way to optimize language generation models using reinforcement learning, as a way to adapt to human preferences; the paper [4] also presents a benchmark, GRUE, for evaluating RL methods for language generation. Majumder and McAuley [5] investigate the use of explanations to debias NLP models while maintaining a good trade-off between predictive performance and bias mitigation.

At the end of the day, I got to ask a lot of questions to some very smart people during our panel discussion – thanks to John Langford, Karthik Narasimhan, Aida Nematzadeh, and Alane Suhr for taking part, and thanks to the audience for some great interactions too. The wide-ranging discussion touched on the evaluation of interactive systems (how to use static data for evaluation, evaluating how well models adapt to user input), working with researchers and users from other fields, different forms of interaction besides language, and challenges that are specific to interactive NLP.

We plan to be back at a future conference (not sure which one yet!) for the next iteration of InterNLP. Large language models and in-context learning are clearly revolutionizing this space in some ways, but I’m convinced we still have a lot of work to do to design interactive machine learning systems that are accountable, reliable, and require fewer resources.

Thank you to Nguyễn Xuân Khánh for letting us include his InterNLP workshop photos.

1. Aggarwal, Deshpande and Narasimhan, SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification, https://arxiv.org/pdf/2301.11309.pdf

2. Mishra and Diesner, PyTAIL: Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data, https://internlp.github.io/documents/2022/papers/24.pdf

3. Mohanty et al., Collecting Interactive Multi-modal Datasets for Grounded Language Understanding, https://internlp.github.io/documents/2022/papers/17.pdf

4. Ramamurthy et al., Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization, https://arxiv.org/abs/2210.01241

5. Majumder and McAuley, InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions, https://arxiv.org/abs/2210.07440

2023 AAAI Conference Blog – Amarpal Sahota

Posted on February 22, 2023February 23, 2023 by j.crabbe

This blog post is written by AI CDT Student Amarpal Sahota

I attended the 37^th AAAI Conference on Artificial Intelligence from the 7^th of February 2023 to the 14^th February. This was my first in person conference and I was excited to travel to Washington D.C.

The conference schedule included Labs and Tutorials February 7^th – 8^th , the main conference February 9^th – 12^th followed by the workshops on February 13^th – 14^th.

Arriving and Labs / Tutorials

I arrived at the conference venue on 7^th February to sign in and collect my name badge. The conference venue (Walter E. Washington Convention Center) was huge and had within it everything you could need from areas to work or relax to restaurants and of course many halls / lecture theatres to host talks.

I was attending the conference to present a paper at the Health Intelligence Workshop. Two of my colleagues from the University of Bristol (Jeff and Enrico) were also attending to present at this workshop (we are pictured together below!).

The tutorials were an opportunity to learn from experts on topics that you may not be familiar with yourself. I attended tutorials on Machine Learning for Causal Inference, Graph Neural Networks and AI for epidemiological forecasting.

The AI for epidemiological forecasting tutorial was particularly engaging. The speakers were very good at giving an overview of historical epidemiological forecasting methods and recent AI methods used for forecasting before introducing state of the art AI methods that use machine learning combined with our knowledge of epidemiology. If you are interested, the materials for this tutorial can be accessed at : https://github.com/AdityaLab/aaai-23-ai4epi-tutorial .

Main conference Feb 9^th – Feb 12^th

The main conference began with a welcome talk in the ‘ball room’. The room was set up with a stage and enough chairs to seat thousands. The welcome talk introduced included an overview of the different tracks within the conference (AAAI Conference of AI, Innovative Application of AI, Educational Advances in AI) , statistics around conference participation / acceptance and introduced the conference chairs.

The schedule for the main conference each day included invited talks and technical talks running from 8:30 am to 6pm. Each day this would be followed by a poster session from 6pm – 8pm allowing us to talk and engage with researchers in more detail.

For the technical talks I attended a variety of sessions from Brain Modelling to ML for Time-Series / Data Streams and Graph-based Machine Learning. Noticeably, all of the sessions were not in person. They were hybrid, with some speakers presenting online. This was disappointing but understandable given visa restrictions for travel to the U.S.

I found that many of the technical talks became difficult to follow very quickly with these talks largely aimed at experts in the respective fields. I particularly enjoyed some of the time-series talks as these relate to my area of research. I also enjoyed the poster sessions that allowed us to talk with fellow researchers in a more relaxed environment and ask questions directly to understand their work.

For example, I enjoyed the talk ‘SVP-T: A Shape-Level Variable-Position Transformer for Multivariate Time Series Classification‘ by PhD researcher Rundong Zhuo. At the poster session I was able to follow up with Rundong to ask more questions and understand his research in detail. We are pictured together below!

Workshops Feb 13^th – 14^th

I attended the 7th International Workshop On Health Intelligence from 13^th to 14^th February. The workshop began with opening remarks from the Co-chair Martin Michalowski before a talk by our first keynote speaker. This was Professor Randi Foraker who spoke about her research relating to building trust in AI for Improving Health Outcomes.

This talk was followed by paper presentations with papers on related topics grouped into sessions. My talk was in the second session of the day titled ‘Classification’. My paper (pre-print here) is titled ‘A Time Series Approach to Parkinson’s Disease Classification from EEG’. The presentation went reasonably smoothly and I had a number of interesting questions from the audience about applications of my work and the methods I had used. I am pictured giving the talk below!

The second half of the day focused on the hackathon. The theme of the hackathon was biological age prediction. Biological ageing is a latent concept with no agreed upon method for estimation. Biological age tries to capture a sense of how much you have aged in the time you have been alive. Certain factors such as stress and poor diet can be expected to age individuals faster. Therefore two people of the same chronological age may have different biological ages.

The hackathon opened with a talk on biological age prediction by Morgan Levin (The founding Principal Investigator at Altos Labs). Our team for the hackathon included four people from the University of Bristol – myself , Jeff , Enrico and Maha. Jeff (pictured below) gave the presentation for our team. We would have to wait until the second day of the conference to find out if we won one of the three prizes.

The second day of the workshop consisted of further research talks, a poster session and an awards ceremony in the afternoon. We were happy to be awarded the 3^rd place prize of $250 for the hackathon! The final day concluded at around 5pm. I said my good byes and headed to Washington D.C. airport for my flight back to the U.K

Through the AI of the storm – Emily Vosper at the Allianz climate risk award 2022

Posted on December 15, 2022December 15, 2022 by j.crabbe

This blog post is written by CDT Student Emily Vosper

This December I travelled to Munich, Germany, to take part in the Allianz climate risk award. Allianz set up this initiative to acknowledge the work done by young scientists who aim to build resilience to and/or reduces the risk of extreme weather events that are exacerbated by climate change. The award is open to PhD candidates and post-doctoral researchers who first submit an essay that outlines their work and the top four are invited to Munich where they present to the Allianz team.

In previous years, finalists have been working on very different climate hazards, but by chance this year the finalists all came from a tropical cyclone and/or flooding background. The finalists consisted of Mona Hemmati (Columbia University) who is a postdoctal researcher specialising in flood-related risks in tropical cyclones, Peter Pfeiderer (Humboldt University Berlin) whose work includes studying seasonal forecasts of tropical cyclones and Daniel Kahl (University of California) who studies flood exposure on a demographic level to understand community vulnerability for his PhD.

On Monday evening, the finalists were invited to meet the Allianz climate risk team at a Bavarian tapas bar. This evening was a great opportunity to get to know a bit about each other in a more relaxed setting, and a chance to sample some of the local cuisine!

On Tuesday, we met at the Allianz offices for the award day. With an excited buzz in the air, the event commenced with a keynote talk by Dr. Nicola Ranger, Oxford University, who spoke on the need to implement climate resilient finance strategies and during the Q and A session there was active discussion on how this could be achieved effectively. We also heard from Chris Townsend, a member of the board of management for Allianz SE, who introduced us to Allianz’ legacy and highlighted the exciting work going on in the climate risk space. We then heard engaging talks from Mona and Peter before a coffee break, followed by an articulate talk from Daniel. As the final speaker, I rounded off the presentation with my talk about how I’ve been using a generative adversarial network to enhance the resolution of tropical cyclone rainfall data. All presentations were followed by a group Q and A session where we discussed the exciting possibility of a collaboration between the four of us as our projects are very complimentary in nature.

With the award in its sixth year, there is now an alumni network of previous finalists rich with expertise in climate hazards and ample opportunity for future collaboration, so watch this space!

Left to Right: Holger Tewes-Kampelmann (CEO Allianz Reinsurance), Peter Pfeiderer (Humboldt University Berlin), Dr. Sibylle Steimen (MD Advisory & Services, Allianz Reinsurance), Emily Vosper (University of Bristol), Mona Hemmati (Columbia University), Daniel Kahl (UC Irvine), Chris Townsend (Member of the Board of Management, Allianz SE) and Dr. Nicola Ranger (Smith School of Enterprise and the Environment, Oxford University).

BIAS 22 Review Day 1 – Daniel Bennett “Complexity and Embodiment in Human Computer Interaction”

Posted on November 23, 2022November 23, 2022 by j.crabbe

This blog post is written/edited by CDT Students Amarpal Sahota and Oliver Deane

This was a thought provoking starting point and one that clearly has a large impact on human computer interaction. Daniel stated that this is a line of research in psychology, cognitive science, and robotics, that has run somewhat parallel to mainstream psychology.

One of the initiators of this was James J Gibson. Gibson and others in the last 70 years did a lot of work on how we use resources outside of just the brain, in our environment and in our bodies, and coordinate all of these together to behave effectively. Daniel stated that with the lens of embodiment we start focusing on processes, interactions, and relations, and the dynamics that follow and this is primarily a change in how we model things.

Therefore, to summarize one could consider the traditional cognitive model as a linear system. First we sense the world, then we form a representation of that world in our brain. Then the representation gets processed through a bunch of neatly defined modules, updates existing plans and intentions, and results in some representation of an action, which we then carry out. The embodied view is more complex as we are not simply in the world but also a part of it. The world is changing constantly, and our behaviour and cognition is simply another physical process in this world.

At a high-level embodied approaches consider behaviour in the world as a kind of continual adjustment and adaptation, with most behaviours are grounded in a kind of improvisatory, responsive quality. Daniel shared a good example of this from Lucy Suchman related to canoeing where you may have an idea of your plan as you look down the river ‘I need to stay left there, slow down over there’ but at execution time you have to adapt your plan.

Daniel stated that a lot of work has been done observing a wide range of human behaviours, from technology interaction, to manning air-traffic control centres and crewing ships. In all of these contexts it is argued that our embodied skills – our adaptation and our implicit skills of coordination with the mess of the situation as it plays out – are the most important factor in determining outcomes.

Human Computer Interaction is increasingly focused on complex behaviours. Daniel talked about the idea that we’re going to do more and more in augmented reality and virtual reality. Computing will be integrated to support a wide range of everyday behaviours, which are not conventionally “cognitive” – you’re not sitting and thinking and making only very small movements with your fingers on a keyboard.

Daniel has a particular interest in musical performance and coordination of musicians. His perspective is that musical performance with technology, technology supported sports training and gaming, particularly team multiplayer games, are cases where static models of cognition seem to break down. He believes modelling in terms of processes and synchronization has great power.

Daniel then spoke about how interaction effects are important in Human Computer Interaction. Firstly, giving the example that notifications influence a person to use their phone. Secondly, the more a person uses their phone the more they cause notifications to appear. He posed the interesting question, how does one disentangle this hypothesis to find out the degree to which notifications influence us?

Daniel then spoke about how reciprocal, interaction dominant effects also play a significant role in the organisation of our individual skilled behaviour. He gave us an overview of his own research where he found evidence of interaction dominant coordination processes in a simple skilful game task, where users are asked to control a cursor to herd sheep.

BIAS 22 – Review day 2 keynote – Prof. Liz Sonenberg: “Imperfectly rational, rationally imperfect, or perfectly irrational…”

Posted on October 21, 2022October 21, 2022 by j.crabbe

Imperfectly rational, rationally imperfect, or perfectly irrational: challenges for human-centered AI keynote by Prof. Liz Sonenberg

This blog post is written/edited by CDT Students Isabella Degen and Oliver Deane

Liz opened the second day of BIAS 22 with her thought-provoking and entertaining keynote speech about automatic decision-making aids. She demonstrated how we humans make perfectly irrational decisions and spoke about the implications of using Explainable Artificial Intelligence (XAI) for better decision-making. Liz’s talk mentioned a great body of research spanning psychology, mathematics, and computer science for which she kindly provides all the references here https://tinyurl.com/4njp563e.

Starting off, Liz presented research demonstrating how subtle influences in our life can change the decisions we make despite us thinking that we are making them completely rationally. What we believe is human rational decision-making in fact is littered with cognitive biases. Cognitive bias is when we create a subjective reality based on a pattern we perceive regardless of how representative that pattern is of all the information. Anchoring is a type of cognitive bias that happens when a decision of a person is influenced by an anchor such as a random number being shown while the person knows that they are being shown a random number that has nothing to do with their decision. An example Liz shared is an experiment by Englich et al who used irrelevant anchors to change experts’ decision-making. In the experiment young judges were asked to discover the length of the sentence for a theft crime by throwing a dice. Unkown to the judges the dice was rigged: for one group of judges it would throw high numbers, for the other it would throw low numbers. The judges knew that throwing a dice should not influence their decision. However, the result was that the group with the dice giving low numbers gave a 5 months sentence while the group with the dice giving high numbers gave an 8 months sentence. This is not the only kind of cognitive bias. Human decision making also suffers from framing bias where the way in which data is presented can affect the decision we make. As well as confirmation bias where we tend to interpret new information as a confirmation of our existing beliefs without considering that we only ever observe a limited kind of information and so forth. With these examples Liz made us doubt how clearly and rationally we humans can make decisions.

The irrationality of humans is an interesting challenge to consider for researchers attempting to create intelligent systems that help us humans make better decisions. Should we copy the imperfect human rationality in intelligent agents, or should we make them more rational than humans? And what does that mean for interactions between human and intelligent systems? Research shows that it is important that human operators have a sense of what the machine is doing to be able to interact with it. From accidents such as the Three Mile Island’s partial meltdown of a nuclear reactor, we can learn how important it is to design systems in a way that does not overwhelm the human operator with information. The information presented should be just enough to enable an operator to make a high-quality decision. It should help the operator to know when they can trust the decision the machine made and when to interrupt. When designing these systems, we need to keep in mind that people suffer from biases such as automation bias. Automation bias happens when a human cannot make a decision based on the information the machine provides and instead decides to just go with the machine’s decision knowing that the machine is more often right than the human. Sadly, this means that a human interacting with a machine might not be able to interrupt the machine at the right moment. We know that human decision-making is imperfectly rational. And while automation bias appears to be an error, it is in fact a rational decision in the context of limited information and time available to the human operator.

One promise of XAI is to use explanations to counteract various cognitive biases and with that help a human operator to make better decisions together with an intelligent system. Liz made a thought-provoking analogy to the science of magic. Magicians use our limited memory and observation abilities to manipulate our feelings and deceive us and make the impossible appear possible. A magician knows that the audience tries to spot how the trick works. And on the other hand, the audience also knows that the magician tries to deceive them and that they are trying to discover how the trick works. Magicians understand their audience well. They know what humans really do and exploit the limited resources they have. Like in magic human-centered AI systems ought to anticipate how perfectly irrational we humans make decisions to enable us to make better decisions and counteract our biases.

BIAS 22 – Review day 2 talk – Dr Oliver Ray: “Knowledge-driven AI”

Posted on October 18, 2022October 18, 2022 by j.crabbe

This blog post is written/edited by CDT Students Daniel Collins and Matt Clifford

BIAS 22 – Day 2, Dr Oliver Ray: “Knowledge-driven AI”

The second talk of day two was delivered by Dr Oliver Ray (University of Bristol), on the topic of human-in-the-loop machine learning using Inductive Logic Programming (ILP) and its application in cyber threat elucidation.

Cyber threat elucidation is the task of analysing network activity to identify ransomware attacks, and to better understand how they unfold. Ransomware is a type of malware which infects victims’ devices, encrypts their data, and demands money from them to restore access. Infection typically occurs through human error. For example, a person may be unwittingly tricked into downloading and running a “trojan” – malware that has been disguised as a legitimate and benign file. The executed ransomware encrypts data, and backups of that data, on the infected system, and the perpetrator can then demand a ransom payment for decryption services. However, ransomware does not always start encrypting data immediately. Instead, it may lay relatively dormant whilst it spreads to other networked systems, and spend time gathering sensitive information, and creating back-ups of itself to block data recovery. If an attack can be identified at this stage or soon after it has started encrypting data, it can be removed before most of the data has been affected.

Ransomware is a persistent threat to cyber security, and each new attack can be developed to behave in unpredictable ways. Dr Ray outline the need for better tools to prepare for new attacks – when faced with a new attack, there should be systems to help a user understand what is happening and what has happened already so that ransomware can be found and removed as quickly as possible, and relevant knowledge can be gained from the attack.

To identify and monitor threats, security experts may perform forensic analysis of Network Monitoring Systems (NMS) data from around the time of infection. This data exists in the form of network logs – relational databases containing a time-labelled record of events and activity occurring across the networked systems. However, there are very large amounts of log data, and most of it is associated with benign activity, unrelated to the threat, making it difficult to find examples of malicious activity. Further, in the case of new threats, there are little to no labelled examples of logs known to be related to an attack. Human knowledge and reasoning are therefore crucial for identifying relevant information in the logs.

ILP based machine learning (ML) was then presented by Dr Ray as a promising alternative to more ‘popular’ traditional ML methods for differentiating ransomware activity from benign activity in large network logs. This is because ILP is better suited for working with relational data, an area where deep learning and traditional ML methods can struggle since often require tabular or vectorisable data formats. ILP not only gives the ability to make predictions on relational data, but it also produces human interpretable logic rules through which it is possible to uncover and learn about the system itself. This could provide valuable insights into how the infection logs are generated, and which features of the logs are important for identification, as opposed to guessing which features might be important.

Dr Ray went on to detail the results of his work with Dr Steve Moyle (Amplify Intelligence UK and Cyber Security Centre, University of Oxford), on a novel proof-of-concept for an ILP based “eXplanatory Interactive Relational Machine Learning” (XIRML) system called “Acuity”. This human-in-the-loop system allows ILP and cyber security experts to direct the cyber threat elucidation process, through interactive functionality for guided data-caching on large network logs, and hypothesis-shaping for rebutting or altering learned logic rules.

In his concluding remarks, Dr Ray shared his thoughts on the future of this technology. As he sees it, the goal is to develop safe, auditable systems that could be used in practice by domain experts alone, without the need for an ILP expert in the loop. To this end, he suggests that system usability and human-interpretable outputs are both crucial factors for the design of future systems.

BIAS 22 – Review Day 2 talk – Dr Nirav Ajmeri: “Ethics in Sociotechnical Systems'”

Posted on September 13, 2022October 18, 2022 by j.crabbe

This blog post is written/edited by CDT Students Jonathan Erkine and Jack Hanslope

Following from a great keynote by Liz Sonenberg, Dr Nirav Ajmeri presented a discussion on Ethics in Socio-Technical Systems (STS).

As is common practice in discussions on AI, we began by looking inwards to what kind of human behaviour we are trying to replicate – what aspect of intelligence have we defined as our objective? In this case it was the ability of machines to make ethical decisions. Dr. Ajmeri referred to Kantian and Aristotelian ethical frameworks which describe moral duty and virtuous behaviour to establish an ethical baseline, which led to the first main takeaway of the discussion:

We must be capable of expressing how humanity defines, quantifies, and measures ethics before discussing how we might synthesise ethical behaviour.

Dr. Ajmeri clarified that ethical systems must be robust to situations where there are “no good choices”. That is, when even a human might struggle to see the most ethical path forwards. Keen to move away from the trolley problem, Nirav described a group of friends who can’t agree on a restaurant for their evening meal, expounding on the concepts of individual utility, rationality, and fairness to explain why science might fail to resolve the problem.

The mathematical solution might be a restaurant that none of them enjoy, and this could be the same restaurant for every future meal which they attend. From this example, the motivation behind well-defined ethics in socio-technical systems becomes clear; computers lack the ability to apply emotion when reasoning about the impact of their decisions, leading to the second lesson which we took from this talk;

Ethical integration of AI into society necessitates the design of socio-technical systems which can artificially navigate “ethical gridlock”.

Dr. Ajmeri then described the potential of multiagent systems research for designing ethical systems by incorporating agents’ value preferences (ethical requirements) and associated negotiation techniques. This led to a good debate on the merits and flaws of attempting to incorporate emotion into socio-technical systems, with questions such as:

Can the concept of emotion be heuristically defined to enable pseudo-emotional decision making in circumstances when there is no clear virtuous outcome?

Is any attempt to incorporate synthetic emotion inherently deceitful?

These questions were interesting by the very nature that they couldn’t be answered, but the methods described by Nirav did, in the authors opinion, describe a system which could achieve what was required of it – to handle ethically challenging situations in a fair manner.

What must come next is the validation of these systems, with Nirav prompting that the automated handling of information with respect to the (now not-so-recent) GDPR regulations would provide a good test bed, prompting the audience to consider what this implementation might involve.

The end of this talk marked the halfway point of the BIAS summer school, with plenty of great talks and discussions still to come. We would like to thank Dr. Nirav Ajmeri for this discussion, which sits comfortably in the wheelhouse of problems which the Interactive AI CDT has set out to solve.