Risk and Decision Making Module (Queen Mary MSc Programme)
ECS7005P - Risk and Decision-Making for Data Science and AI - 2022
​
SUMMARY
This module provides a comprehensive overview of the challenges of risk assessment, prediction and decision-making covering public health and medicine, the law, government strategy, transport safety and consumer protection. Students will learn how to see through much of the confusion spoken about risk in public discourse, and will be provided with methods and tools for improved risk assessment that can be directly applied for personal, group, and strategic decision-making. The module also directly addresses the limitations of big data and machine learning for solving decision and risk problems. While classical statistical techniques for risk assessment are introduced (including hypothesis testing, p-values, and regression) the module exposes the severe limitations of these methods. In particular, it focuses on the need for causal modelling of problems and a Bayesian approach to probability reasoning. Bayesian networks are used as a unifying theme throughout.
LEARNING AIMS AND OUTCOMES
By the end of this module you will be able to:
-
understand the risk assessment challenges in public health and medicine, the law, finance, government strategy, transport safety and consumer protection
-
see through the many ways risk is misrepresented in the media and by different organisations
-
reason rationally about risk in a range of different contexts
-
understand the importance of trade-offs and utilities in risk assessment and decision-making
-
understand the importance of causal models for effective risk assessment, and to be able to build such models for personal, group, and strategic decision-making.be able to undertake decision-making that takes account of conflicting stakeholders and objectives
-
be able to understand and use basic probability and statistics (using appropriate tools) for risk assessment and quantitative decision-making.
-
understand the limitations of big data and machine learning and how a ‘smart data’ approach leads to improved outcomes
-
be able identify and use efficient data-collection strategies for a wide range of risk assessment problems
TEACHING ARRANGEMENTS
This module will be delivered via mixed mode education (MME). You should use, read and learn from the supporting educational materials (“asynchronous content”) on QMPlus at a time suits you but follows the weekly plan as specified in the CONTENT tab section. There are pre-recorded lectures each week but synchronous and on-campus activities will include a weekly session to review and discuss the lecture material and weekly labs. All students are expected to attend and engage face-to-face / in person learning activities on campus, providing an important opportunity to interact with staff as well as with your fellow students. If you are prevented from coming to campus (e.g. due to international travel restrictions) you will be able to join MME sessions remotely online. All students, irrespective of location, should participate in sessions using a laptop / SMART device to enable interaction with fellow students. This digital interaction will build a strong learning community and student body.
Students are expected to engage with the various additional weekly activities presented on QMPlus. These include: watching short videos, reading the additional material, and working with models and data using tools such as Excel, AgenaRisk and MatLab. Some of these activities will be required to be done before the lecture and some after it. There are also quizes, which can be completed any time (and do not count towards the final assessment) that help you to self-monitor your understanding and progress.
Assessment SUMMARY
The main formal assessment will be by a 2-hour written examination during the main exam period. This will count for 80% of the mark.
The other 20% will be based on 2 assignments that will need to be completed in weeks (5 and 9).
RECOMMENDED READING
-
Agrawal, A., Gans, J., & Goldfarb, A. (2018). Prediction Machines: The Simple Economics of Artificial Intelligence. Harvard Business Review Press.
-
Fenton, N. E., & Neil, M. (2018). Risk Assessment and Decision Analysis with Bayesian Networks (2nd ed.). CRC Press, Boca Raton.
-
Gigerenzer, G. (2002). Reckoning with Risk: Learning to Live with Uncertainty. London: Penguin Books.
-
Kendrick, M. (2015). Doctoring data : how to sort out medical advice from medical nonsense. Columbus Publishing.
-
Lagnado, D. A. (2021). Explaining the Evidence: How the Mind Investigates the World. Cambridge University Press.
-
Pearl, J., & Mackenzie, D. (2018). The book of why : the new science of cause and effect. New York: Basic Books.
-
Salsburg, D. S. (2017). Errors, Blunder, and Lies: How to Tell the Difference. CRC Press, Boca Raton.
-
Sowell, T. (2018). Discrimination and disparities. New York: Basic Books.
-
Spiegelhalter, David. 2019. The Art of Statistics: Learning from Data. Pelican Books.
-
Taleb, N. N. (2018). Skin in the game : hidden asymmetries in daily life. Allen Lane.
DETAILED SYLLABUS BY WEEK
Week 1 - Risk and Decision Making: Illusions and fallacies
This lecture and supporting materials introduces many of the core ideas in the module. By the end of the week you should have some awareness of why much of the public discourse about risk and statistics is problematic or flawed. Topics: Cognitive biases; Basic probability laws; Probability illusions and puzzles; Mundane and incredible events; Risk perception: What is the safest form of travel? Assessing Medical risks; Spurious correlations; Hidden causal explanations; basic Simpson's paradox; Limitations of big data; Pearl's Ladder of causation
Supporting Video: Basic Probability Primer Part 1 video: https://youtu.be/kXq1zPS1P4s
Week 2 - Assessing Risk after new evidence - an introduction to Bayes and AgenaRisk
By the end of this lesson and workshop you will understand what Bayes theorem is and why it is central to quantitative risk assessment. You will also know how to perform Bayesian calculations automatically and how to build and run simple Bayesian network models in AgenaRisk. Topics: What does a positive test result mean?; Visual introduction to Bayes Theorem; Conditional Probability; Prosecutors Fallacy; Independent events; Marginal Probability; Frequentists versus subjective probability; Bayes Theorem; A simple Bayesian network; AgenaRisk introduction and demo
Supporting videos:
-
Basic Probability Primer Part 2 video:https://youtu.be/-XbuRGLCaY0
-
Basic Probability Primer Part 3 video:https://youtu.be/M0nUEy7V2Tw
-
What does a positive Covid test tell you about the probability you have the virus? https://youtu.be/M0nUEy7V2Tw
-
Simple introduction to Bayesian Networks with the classic ‘Asia’ model https://youtu.be/v00gk1_DI9M
-
The Deer Hunter: A lesson in the basics of risk and probability assessment: https://youtu.be/cBgT7hDIzLs
-
A short and simple explanation of Bayes Theorem: https://youtu.be/HMAxrY8Ob9Y
-
Building and running diagnostic testing models in a Bayesian network tool (AgenaRisk) https://youtu.be/DwLtVBgPagM
-
Diagnostic testing: the impact of confirmatory testing explained using simple Bayesian networks https://youtu.be/GLnTC4LLLLA
-
Bayesian network model for personalised COVID19 risk assessment and contact tracing: https://youtu.be/3KGYuLFMRSY
Week 3 - Classical Statistics for risk assessment
By the end of this week you will understand what the most common classical statistical techniques are and how they need to be supplemented with causal modelling for effective risk assessment and prediction. You will be able to build and run simple models (in Excel, MatLab and AgenaRisk) that both highlight the inadequacies and also practically fix them. Topics: Will attending concerts improve your health, will sleeping more than 9 hours a day increase your risk of a stroke?; Confounding variables; The Normal distribution and its limitations; Predicting economic growth; Errors of omission: how can average household incomes be falling when average salaries are rising; Sporting form: quality or luck?; Correlation; Linear regression; What is the probability that the son of a 6ft tall father will be at least 6ft tall?; Regression to the mean
Supporting videos to watch
-
Galton board, the Normal distribution, and regression to the mean: https://media.qmplus.qmul.ac.uk/media/Galton+Board%2C+the+Normal+distribution+and+Regression+to+the+Mean/1_0yosb7sc
-
How to do simple distribution fitting in MatLabhttps://youtu.be/fCJEPs21j0k
-
How to do distribution fitting in AgenaRiskhttps://youtu.be/h_ZOj701Ijo
-
How to do simple regression and correlation in Excelhttps://youtu.be/ThFArMA61zE
-
How to do simple regression analysis in MatLabhttps://youtu.be/K5XyqsrcjUc
-
Demonstrating and simulating regression to the mean in AgenaRisk: https://youtu.be/K5XyqsrcjUc
Week 4 - Addressing the limitations of classical statistics for risk assessment using Bayesian approaches
There are two parts to this week's materials, but both address fundamental limitations of classical statistical methods for risk assessment by using Bayesian solution. By the end of this week you will understand how to model and predict rare events, what classical confidence intervals are and how to produce Bayesian alternatives, and finally how to both interpret the statistical claims that result from empirical studies and how to conduct rigorous hypothesis testing using a Bayesian approach that overcomes the severe limitations of classical hypothesis testing and p-values.
Topics: What is the biggest risk to Las Vegas?; Predictable versus unpredictable risks; Classical and Bayesian confidence intervals; Modelling rare events; What does "there's a 95% chance that most recent global warming is man-made" mean?; Classical hypothesis testing, p-values, and Z-score; Significance testing; The Oomph versus Precision testing dilemna and how to resolve it; Bayesian hypothesis testing; Which drug should we recommend?;
Supporting videos to watch
-
Modelling rare events in AgenaRiskhttps://youtu.be/YCCT-UoJIaU
-
The Binomial distribution https://youtu.be/28ZuaaV9AjY
-
Creating a Binomial distribution in AgenaRiskhttps://youtu.be/YRrgvJ0rcDc
-
Confidence intervals and their Bayesian alternative: https://youtu.be/HzbfF-FjCp8
-
Simple example demonstrating the limitations of p-values for hypothesis testinghttps://youtu.be/vk0rKIaGQBs
-
A simple example of Bayesian hypothesis testinghttps://youtu.be/s4yCu__18Jo
-
Bayesian hypothesis test to determine which of two materials is betterhttps://youtu.be/Mj6UgiIxCm4
-
Bayesian hypothesis testing: which treatment do we choose to reduce mortality rate? https://youtu.be/R9QS1n3DrOA
Week 5 - Risk perception, framing and definitions
By the end of this lesson and workshop you will know the various different common definitions of risk and why a causal framing of risk is necessary to avoid key misunderstandings. You will be able to define and build causal models of risk and opportunity that support decision-making with meaningful quantification including cost-benefit trade-offs; this includes being able to build and run a simple influence diagram in AgenaRisk.
Topics: Risk misperception explained through the Binomial distribution and the Poisson distribution; How unusual is it to see a very high number of deaths in a hospital?; Importance of 'problem framing' for risk assessment; Relative versus Absolute risk; Risk ratios, odds ratios and hazard ratios; Risk as probability times impact (and why this is problematic); Risk registers and their limitations; Heat maps; Risk versus opportunity; Risk and opportunity defined through causal models; Risk from different perspectives; Why did it makes sense for Bruce Willis to try to save the world in Armageddon?; Are lawnmowers a greater risk than terrorists?; Need for cost-benefit analysis as part of risk assessment; Influence diagrams; The Ben Geen case - problems with statistically driven criminal investigations
Supporting videos to watch
-
Risk misperception: which hospital is more likely to have more than 60% male births?https://youtu.be/oaDYMbD3_1U
-
What is the probability the same nurse will be on duty during a series of unusual events?https://youtu.be/Q_G_sgftZ1Q
-
Relative versus absolute riskhttps://youtu.be/Q_G_sgftZ1Q
-
The Poisson distribution and how to use it in AgenaRiskhttps://youtu.be/w627KDSRLjQ
-
Influence diagrams in AgenaRisk https://youtu.be/N3AnJzxnxvg
Week 6 - Understanding data through causal paradoxes
By the end of this lesson and workshop you will understand the problems for data analytics that are caused by paradoxes such as Simpson's and Berkson's and you will understand how they can be fully explained - and avoided - by causal models. You will understand why it is so important to consider causal explanations for observed data before attempting to perform any data analytics. You will be able to build simple causal models of observed data in AgenaRisk and perform simple but powerful analyses not possible from the data alone.
Topics: Are attractive people more likely to be mean (Berkson's paradox, collider bias); If a baby is born underweight why is it more likely to survive if the mother is a smoker rather than a non-smoker?; Simpson's paradox and its causal explanation: Why are women who apply to Cambridge less likely to get in even though, for every single subject women are more likely to get in than men? How is it possible the drug is effective for every individual sub-category of people but not effective overall?; Randomized control trials and their limitations; Simulating interventions; Causal explanations of observed data; Does increasing hotel room rates lead to increased revenue?
Supporting videos
-
Causal explanation for why car accident fatalities decrease in bad weatherhttps://youtu.be/-DVSZ7mcNcE
-
The Smoking birth weight paradoxhttps://youtu.be/eJNPUfO-Raw
-
Collider bias ("Berkson's paradox"): how censored data leads to flawed conclusionshttps://youtu.be/eJNPUfO-Raw
-
Simpson's paradox example 1: kidney stones https://youtu.be/39RZFm4EEzQ
-
Simpson's paradox example 2: exercise v diethttps://youtu.be/2Dz6XPjD7YE
Week 7 - Interventions and Counterfactuals
By the end of this lesson and workshop you will understand why a causal framework is necessary for answering questions which are about 'interventions' or are 'counterfactuals'. You will be able to build and use models to a) simulate the effect of an intervention (hence reach rung 2 of Pearl's ladder) and b) answer counterfactual questions (hence reach level 3 of Pearl's ladder). You will also understand why counterfactuals and causal models are the rational way to define algorithmic bias and fairness.
Topics: Reaching levels 2 and 3 of Pearl's ladder: interventions and counterfactuals; Simulating randomized control trials for medical interventions: which treatment really is more effective? Basic kidney stones treatment example; A patient given treatment A survived; would they have survived if they had taken treatment B?; More general medical treatment models; A student who spent nothing on text books achieved a 2i in her Computer Science degree. Would she have got a first if she had been given £1000 worth of books?; Algorithm bias and fairness
Videos
-
Answering a counterfactual question in AgenaRisk https://youtu.be/IJjeQvaMfuw
-
Using ranked nodes in AgenaRiskhttps://youtu.be/FjERUTPiWjg
Week 8 -Learning from data - algorithms and their accuracy
By the end of this lesson and workshop you will understand the basic principles and methods behind a class of machine learning algorithms called 'supervised learning' and you will be able to apply at least two such methods - logistic regression and naive Bayes - to real data using Excel and AgenaRisk respectively. You will understand how the 'accuracy' of prediction and decision-making algorithms is usually assessed (ROC curves) and be able to compute these in Excel. For problems where you know the causal structure you will be able to use AgenaRisk to automatically learn the table values from data.
Topics:
Can we predict which passengers survived the Titanic?; How well can we predict which students will pass their exam based on number of hours of revision?; Measuring the accuracy of binary classification algorithms: the sensitivity v specificity balance; Measuring accuracy to take account of confidence of prediction: ROC curves; Supervised learning algorithms: Classification trees, Logistic regression, naive Bayes, and other more complex techniques; Over-fitting algorithms; Learning tables from data for causal models
Supporting videos
-
Logistic regression in Excelhttps://youtu.be/EKRjDurXau0
-
Table learning in AgenaRiskhttps://youtu.be/3khoX_RMrKU
Week 9 - Learning from Data - Limitations and how to avoid them
By the end of this lesson and workshop you will understand why data alone - no matter how much of it you have and no matter which fancy machine learning algorithms you apply - cannot generally achieve either accurate prediction or useful decision support for most risk assessment problems. You will be able to use a combination of data and knowledge to learn the probability tables of causal Bayesian network models in AgenaRisk, even when the data are extremely limited and containing missing values. You will learn how to compute 'potential outcomes' as the answers to counterfactual questions and hence avoid the classic problems associated with missing data values.
Topics: Why different machine learning algorithms all achieve similar accuracy; What would Caroline's salary have been if she had studied for a graduate degree?; Why most machine learning methods cannot move beyond 'prediction'; Why causal Bayesian networks are better than naive Bayes; The inevitability of causal models; Can we have causality without correlation?; Why machine learnt models cannot learn causality; Faithfulness: why does taking the contraceptive pill appear to have no effect on thrombosis when we know it does?; Why even the 'biggest data' are never enough; Learning with data PLUS knowledge; Learning with missing data
Supporting videos
-
Predicting potential outcomes: structural equation models and Bayesian networks (Note that this material is also in the main lecture)https://youtu.be/DQt9hCxjXCA
-
Naive Bayes classifiers versus structural models (Note that this material is also in the main lecture) https://youtu.be/DQt9hCxjXCA
-
Causal discovery from data: the problem with “unfaithful” structural models (Note that this material is also in the main lecture)https://youtu.be/I1tCXog58qs
Week 10 - guest lecture on Public Policy Making
Guest lecture by Dr Magda Osman on the importance of evidence-based risk assessment in policy decision-making, with special reference to public policy on food safety. This will enable you to understand the basics of behavioural interventions such as 'nudge' techniques and the problems with such techniques when used to support Government policies.
Week 11 - Legal reasoning - AI, data and Bayes
By the end of this lesson and workshop you will understand the role of Bayes in the Law and how it can improve legal reasoning and avoid common fallacies committed in court. You will learn about the likelihood ratio and its role (and limitations) in determining the probative value of different types of evidence. You will understand the basics of DNA evidence (including DNA mixtures) and how forensic scientists use statistics for DNA evidence; you will also understand why DNA evidence is not as convincing as you may think. You will understand the need for - and potential - for causal models (BNs) to be used for legal reasoning and will be able to build and run models that enable you to determine the impact of different types of evidence.
Topics: Introduction to well-known cases where statistics played a key role; Legal rulings about the use of Bayes in the Law; The prosecutors fallacy and other common probabilistic fallacies made in legal reasoning; DNA evidence and its associated statistics; How to determine the probative value of evidence; The Likelihood ratio: its value and limitations in determining probative value of evidence; The need for causal models (BNs) in evidence evaluation; The special problems of DNA mixture evidence
Videos
-
The Prosecutor's fallacy https://youtu.be/E3VoTTR8MXM
-
Is a positive test for handling explosives probative if the suspect also handled playing cards?https://youtu.be/0oflOdl1TIg
-
The Sally Clark case: a simple Bayesian network analysishttps://youtu.be/eeWlfSQEiD0
-
Handling conflicting criminal evidence in a Bayesian networkhttps://youtu.be/eeWlfSQEiD0
-
Does a tiny trace of matching DNA support the prosecution or defence case? https://youtu.be/eeWlfSQEiD0
-
On the limitations of statistical DNA evidencehttps://youtu.be/V0t6m9i093c
Week 12 Revision
By the end of this week you will understand what the exam structure is and how to do well in it. Worked solutions to exam questions will be presented.