Risk and Decision Making Module (Queen Mary MSc Programme)

ECS7005P - Risk and Decision-Making for Data Science and AI - 2022

SUMMARY

This module provides a comprehensive overview of the challenges of risk assessment, prediction and decision-making covering public health and medicine, the law, government strategy, transport safety and consumer protection. Students will learn how to see through much of the confusion spoken about risk in public discourse, and will be provided with methods and tools for improved risk assessment that can be directly applied for personal, group, and strategic decision-making. The module also directly addresses the limitations of big data and machine learning for solving decision and risk problems. While classical statistical techniques for risk assessment are introduced (including hypothesis testing, p-values, and regression) the module exposes the severe limitations of these methods. In particular, it focuses on the need for causal modelling of problems and a Bayesian approach to probability reasoning. Bayesian networks are used as a unifying theme throughout.

LEARNING AIMS AND OUTCOMES

By the end of this module you will be able to:

understand the risk assessment challenges in public health and medicine, the law, finance, government strategy, transport safety and consumer protection
see through the many ways risk is misrepresented in the media and by different organisations
reason rationally about risk in a range of different contexts
understand the importance of trade-offs and utilities in risk assessment and decision-making
understand the importance of causal models for effective risk assessment, and to be able to build such models for personal, group, and strategic decision-making.be able to undertake decision-making that takes account of conflicting stakeholders and objectives
be able to understand and use basic probability and statistics (using appropriate tools) for risk assessment and quantitative decision-making.
understand the limitations of big data and machine learning and how a ‘smart data’ approach leads to improved outcomes
be able identify and use efficient data-collection strategies for a wide range of risk assessment problems

TEACHING ARRANGEMENTS

This module will be delivered via mixed mode education (MME). You should use, read and learn from the supporting educational materials (“asynchronous content”) on QMPlus at a time suits you but follows the weekly plan as specified in the CONTENT tab section. There are pre-recorded lectures each week but synchronous and on-campus activities will include a weekly session to review and discuss the lecture material and weekly labs. All students are expected to attend and engage face-to-face / in person learning activities on campus, providing an important opportunity to interact with staff as well as with your fellow students. If you are prevented from coming to campus (e.g. due to international travel restrictions) you will be able to join MME sessions remotely online. All students, irrespective of location, should participate in sessions using a laptop / SMART device to enable interaction with fellow students. This digital interaction will build a strong learning community and student body.

Students are expected to engage with the various additional weekly activities presented on QMPlus. These include: watching short videos, reading the additional material, and working with models and data using tools such as Excel, AgenaRisk and MatLab. Some of these activities will be required to be done before the lecture and some after it. There are also quizes, which can be completed any time (and do not count towards the final assessment) that help you to self-monitor your understanding and progress.

Assessment SUMMARY

The main formal assessment will be by a 2-hour written examination during the main exam period. This will count for 80% of the mark.

The other 20% will be based on 2 assignments that will need to be completed in weeks (5 and 9).

RECOMMENDED READING

Agrawal, A., Gans, J., & Goldfarb, A. (2018). Prediction Machines: The Simple Economics of Artificial Intelligence. Harvard Business Review Press.
Fenton, N. E., & Neil, M. (2018). Risk Assessment and Decision Analysis with Bayesian Networks (2nd ed.). CRC Press, Boca Raton.
Gigerenzer, G. (2002). Reckoning with Risk: Learning to Live with Uncertainty. London: Penguin Books.
Kendrick, M. (2015). Doctoring data : how to sort out medical advice from medical nonsense. Columbus Publishing.
Lagnado, D. A. (2021). Explaining the Evidence: How the Mind Investigates the World. Cambridge University Press.
Pearl, J., & Mackenzie, D. (2018). The book of why : the new science of cause and effect. New York: Basic Books.
Salsburg, D. S. (2017). Errors, Blunder, and Lies: How to Tell the Difference. CRC Press, Boca Raton.
Sowell, T. (2018). Discrimination and disparities. New York: Basic Books.
Spiegelhalter, David. 2019. The Art of Statistics: Learning from Data. Pelican Books.
Taleb, N. N. (2018). Skin in the game : hidden asymmetries in daily life. Allen Lane.

DETAILED SYLLABUS BY WEEK

Week 1 - Risk and Decision Making: Illusions and fallacies

This lecture and supporting materials introduces many of the core ideas in the module. By the end of the week you should have some awareness of why much of the public discourse about risk and statistics is problematic or flawed. Topics: Cognitive biases; Basic probability laws; Probability illusions and puzzles; Mundane and incredible events; Risk perception: What is the safest form of travel? Assessing Medical risks; Spurious correlations; Hidden causal explanations; basic Simpson's paradox; Limitations of big data; Pearl's Ladder of causation

Supporting Video: Basic Probability Primer Part 1 video: https://youtu.be/kXq1zPS1P4s

Week 2 - Assessing Risk after new evidence - an introduction to Bayes and AgenaRisk

By the end of this lesson and workshop you will understand what Bayes theorem is and why it is central to quantitative risk assessment. You will also know how to perform Bayesian calculations automatically and how to build and run simple Bayesian network models in AgenaRisk. Topics: What does a positive test result mean?; Visual introduction to Bayes Theorem; Conditional Probability; Prosecutors Fallacy; Independent events; Marginal Probability; Frequentists versus subjective probability; Bayes Theorem; A simple Bayesian network; AgenaRisk introduction and demo

Supporting videos:

Basic Probability Primer Part 2 video:https://youtu.be/-XbuRGLCaY0
Basic Probability Primer Part 3 video:https://youtu.be/M0nUEy7V2Tw
What does a positive Covid test tell you about the probability you have the virus? https://youtu.be/M0nUEy7V2Tw
Simple introduction to Bayesian Networks with the classic ‘Asia’ model https://youtu.be/v00gk1_DI9M
The Deer Hunter: A lesson in the basics of risk and probability assessment: https://youtu.be/cBgT7hDIzLs
A short and simple explanation of Bayes Theorem: https://youtu.be/HMAxrY8Ob9Y
Building and running diagnostic testing models in a Bayesian network tool (AgenaRisk) https://youtu.be/DwLtVBgPagM
Diagnostic testing: the impact of confirmatory testing explained using simple Bayesian networks https://youtu.be/GLnTC4LLLLA
Bayesian network model for personalised COVID19 risk assessment and contact tracing: https://youtu.be/3KGYuLFMRSY

Week 3 - Classical Statistics for risk assessment

By the end of this week you will understand what the most common classical statistical techniques are and how they need to be supplemented with causal modelling for effective risk assessment and prediction. You will be able to build and run simple models (in Excel, MatLab and AgenaRisk) that both highlight the inadequacies and also practically fix them. Topics: Will attending concerts improve your health, will sleeping more than 9 hours a day increase your risk of a stroke?; Confounding variables; The Normal distribution and its limitations; Predicting economic growth; Errors of omission: how can average household incomes be falling when average salaries are rising; Sporting form: quality or luck?; Correlation; Linear regression; What is the probability that the son of a 6ft tall father will be at least 6ft tall?; Regression to the mean

Supporting videos to watch

Galton board, the Normal distribution, and regression to the mean: https://media.qmplus.qmul.ac.uk/media/Galton+Board%2C+the+Normal+distribution+and+Regression+to+the+Mean/1_0yosb7sc
How to do simple distribution fitting in MatLabhttps://youtu.be/fCJEPs21j0k
How to do distribution fitting in AgenaRiskhttps://youtu.be/h_ZOj701Ijo
How to do simple regression and correlation in Excelhttps://youtu.be/ThFArMA61zE
How to do simple regression analysis in MatLabhttps://youtu.be/K5XyqsrcjUc
Demonstrating and simulating regression to the mean in AgenaRisk: https://youtu.be/K5XyqsrcjUc

Week 4 - Addressing the limitations of classical statistics for risk assessment using Bayesian approaches

There are two parts to this week's materials, but both address fundamental limitations of classical statistical methods for risk assessment by using Bayesian solution. By the end of this week you will understand how to model and predict rare events, what classical confidence intervals are and how to produce Bayesian alternatives, and finally how to both interpret the statistical claims that result from empirical studies and how to conduct rigorous hypothesis testing using a Bayesian approach that overcomes the severe limitations of classical hypothesis testing and p-values.

Topics: What is the biggest risk to Las Vegas?; Predictable versus unpredictable risks; Classical and Bayesian confidence intervals; Modelling rare events; What does "there's a 95% chance that most recent global warming is man-made" mean?; Classical hypothesis testing, p-values, and Z-score; Significance testing; The Oomph versus Precision testing dilemna and how to resolve it; Bayesian hypothesis testing; Which drug should we recommend?;

Supporting videos to watch

Modelling rare events in AgenaRiskhttps://youtu.be/YCCT-UoJIaU
The Binomial distribution https://youtu.be/28ZuaaV9AjY
Creating a Binomial distribution in AgenaRiskhttps://youtu.be/YRrgvJ0rcDc
Confidence intervals and their Bayesian alternative: https://youtu.be/HzbfF-FjCp8
Simple example demonstrating the limitations of p-values for hypothesis testinghttps://youtu.be/vk0rKIaGQBs
A simple example of Bayesian hypothesis testinghttps://youtu.be/s4yCu__18Jo
Bayesian hypothesis test to determine which of two materials is betterhttps://youtu.be/Mj6UgiIxCm4
Bayesian hypothesis testing: which treatment do we choose to reduce mortality rate? https://youtu.be/R9QS1n3DrOA

Week 5 - Risk perception, framing and definitions

By the end of this lesson and workshop you will know the various different common definitions of risk and why a causal framing of risk is necessary to avoid key misunderstandings. You will be able to define and build causal models of risk and opportunity that support decision-making with meaningful quantification including cost-benefit trade-offs; this includes being able to build and run a simple influence diagram in AgenaRisk.

Topics: Risk misperception explained through the Binomial distribution and the Poisson distribution; How unusual is it to see a very high number of deaths in a hospital?; Importance of 'problem framing' for risk assessment; Relative versus Absolute risk; Risk ratios, odds ratios and hazard ratios; Risk as probability times impact (and why this is problematic); Risk registers and their limitations; Heat maps; Risk versus opportunity; Risk and opportunity defined through causal models; Risk from different perspectives; Why did it makes sense for Bruce Willis to try to save the world in Armageddon?; Are lawnmowers a greater risk than terrorists?; Need for cost-benefit analysis as part of risk assessment; Influence diagrams; The Ben Geen case - problems with statistically driven criminal investigations

Supporting videos to watch

Risk misperception: which hospital is more likely to have more than 60% male births?https://youtu.be/oaDYMbD3_1U
What is the probability the same nurse will be on duty during a series of unusual events?https://youtu.be/Q_G_sgftZ1Q
Relative versus absolute riskhttps://youtu.be/Q_G_sgftZ1Q
The Poisson distribution and how to use it in AgenaRiskhttps://youtu.be/w627KDSRLjQ
Influence diagrams in AgenaRisk https://youtu.be/N3AnJzxnxvg

Week 6 - Understanding data through causal paradoxes

By the end of this lesson and workshop you will understand the problems for data analytics that are caused by paradoxes such as Simpson's and Berkson's and you will understand how they can be fully explained - and avoided - by causal models. You will understand why it is so important to consider causal explanations for observed data before attempting to perform any data analytics. You will be able to build simple causal models of observed data in AgenaRisk and perform simple but powerful analyses not possible from the data alone.

Topics: Are attractive people more likely to be mean (Berkson's paradox, collider bias); If a baby is born underweight why is it more likely to survive if the mother is a smoker rather than a non-smoker?; Simpson's paradox and its causal explanation: Why are women who apply to Cambridge less likely to get in even though, for every single subject women are more likely to get in than men? How is it possible the drug is effective for every individual sub-category of people but not effective overall?; Randomized control trials and their limitations; Simulating interventions; Causal explanations of observed data; Does increasing hotel room rates lead to increased revenue?

Supporting videos

Causal explanation for why car accident fatalities decrease in bad weatherhttps://youtu.be/-DVSZ7mcNcE
The Smoking birth weight paradoxhttps://youtu.be/eJNPUfO-Raw
Collider bias ("Berkson's paradox"): how censored data leads to flawed conclusionshttps://youtu.be/eJNPUfO-Raw
Simpson's paradox example 1: kidney stones https://youtu.be/39RZFm4EEzQ
Simpson's paradox example 2: exercise v diethttps://youtu.be/2Dz6XPjD7YE

Week 7 - Interventions and Counterfactuals

By the end of this lesson and workshop you will understand why a causal framework is necessary for answering questions which are about 'interventions' or are 'counterfactuals'. You will be able to build and use models to a) simulate the effect of an intervention (hence reach rung 2 of Pearl's ladder) and b) answer counterfactual questions (hence reach level 3 of Pearl's ladder). You will also understand why counterfactuals and causal models are the rational way to define algorithmic bias and fairness.

Topics: Reaching levels 2 and 3 of Pearl's ladder: interventions and counterfactuals; Simulating randomized control trials for medical interventions: which treatment really is more effective? Basic kidney stones treatment example; A patient given treatment A survived; would they have survived if they had taken treatment B?; More general medical treatment models; A student who spent nothing on text books achieved a 2i in her Computer Science degree. Would she have got a first if she had been given £1000 worth of books?; Algorithm bias and fairness

Videos

Answering a counterfactual question in AgenaRisk https://youtu.be/IJjeQvaMfuw
Using ranked nodes in AgenaRiskhttps://youtu.be/FjERUTPiWjg

Week 8 -Learning from data - algorithms and their accuracy

By the end of this lesson and workshop you will understand the basic principles and methods behind a class of machine learning algorithms called 'supervised learning' and you will be able to apply at least two such methods - logistic regression and naive Bayes - to real data using Excel and AgenaRisk respectively. You will understand how the 'accuracy' of prediction and decision-making algorithms is usually assessed (ROC curves) and be able to compute these in Excel. For problems where you know the causal structure you will be able to use AgenaRisk to automatically learn the table values from data.

Topics:

Can we predict which passengers survived the Titanic?; How well can we predict which students will pass their exam based on number of hours of revision?; Measuring the accuracy of binary classification algorithms: the sensitivity v specificity balance; Measuring accuracy to take account of confidence of prediction: ROC curves; Supervised learning algorithms: Classification trees, Logistic regression, naive Bayes, and other more complex techniques; Over-fitting algorithms; Learning tables from data for causal models

Supporting videos

Logistic regression in Excelhttps://youtu.be/EKRjDurXau0
Table learning in AgenaRiskhttps://youtu.be/3khoX_RMrKU

Week 9 - Learning from Data - Limitations and how to avoid them

By the end of this lesson and workshop you will understand why data alone - no matter how much of it you have and no matter which fancy machine learning algorithms you apply - cannot generally achieve either accurate prediction or useful decision support for most risk assessment problems. You will be able to use a combination of data and knowledge to learn the probability tables of causal Bayesian network models in AgenaRisk, even when the data are extremely limited and containing missing values. You will learn how to compute 'potential outcomes' as the answers to counterfactual questions and hence avoid the classic problems associated with missing data values.

Topics: Why different machine learning algorithms all achieve similar accuracy; What would Caroline's salary have been if she had studied for a graduate degree?; Why most machine learning methods cannot move beyond 'prediction'; Why causal Bayesian networks are better than naive Bayes; The inevitability of causal models; Can we have causality without correlation?; Why machine learnt models cannot learn causality; Faithfulness: why does taking the contraceptive pill appear to have no effect on thrombosis when we know it does?; Why even the 'biggest data' are never enough; Learning with data PLUS knowledge; Learning with missing data

Supporting videos

Predicting potential outcomes: structural equation models and Bayesian networks (Note that this material is also in the main lecture)https://youtu.be/DQt9hCxjXCA
Naive Bayes classifiers versus structural models (Note that this material is also in the main lecture) https://youtu.be/DQt9hCxjXCA
Causal discovery from data: the problem with “unfaithful” structural models (Note that this material is also in the main lecture)https://youtu.be/I1tCXog58qs

Week 10 - guest lecture on Public Policy Making

Guest lecture by Dr Magda Osman on the importance of evidence-based risk assessment in policy decision-making, with special reference to public policy on food safety. This will enable you to understand the basics of behavioural interventions such as 'nudge' techniques and the problems with such techniques when used to support Government policies.

Week 11 - Legal reasoning - AI, data and Bayes

By the end of this lesson and workshop you will understand the role of Bayes in the Law and how it can improve legal reasoning and avoid common fallacies committed in court. You will learn about the likelihood ratio and its role (and limitations) in determining the probative value of different types of evidence. You will understand the basics of DNA evidence (including DNA mixtures) and how forensic scientists use statistics for DNA evidence; you will also understand why DNA evidence is not as convincing as you may think. You will understand the need for - and potential - for causal models (BNs) to be used for legal reasoning and will be able to build and run models that enable you to determine the impact of different types of evidence.

Topics: Introduction to well-known cases where statistics played a key role; Legal rulings about the use of Bayes in the Law; The prosecutors fallacy and other common probabilistic fallacies made in legal reasoning; DNA evidence and its associated statistics; How to determine the probative value of evidence; The Likelihood ratio: its value and limitations in determining probative value of evidence; The need for causal models (BNs) in evidence evaluation; The special problems of DNA mixture evidence

Videos

The Prosecutor's fallacy https://youtu.be/E3VoTTR8MXM
Is a positive test for handling explosives probative if the suspect also handled playing cards?https://youtu.be/0oflOdl1TIg
The Sally Clark case: a simple Bayesian network analysishttps://youtu.be/eeWlfSQEiD0
Handling conflicting criminal evidence in a Bayesian networkhttps://youtu.be/eeWlfSQEiD0
Does a tiny trace of matching DNA support the prosecution or defence case? https://youtu.be/eeWlfSQEiD0
On the limitations of statistical DNA evidencehttps://youtu.be/V0t6m9i093c

Week 12 Revision

By the end of this week you will understand what the exam structure is and how to do well in it. Worked solutions to exam questions will be presented.