View commentaries on this research

This is a plain English summary of an original research article

Early warning scores have been used in UK hospitals since the late 1990s to minimise harm to patients. They are designed to highlight deterioration of a patient’s health using measurements of temperature, heart rate, blood pressure and other easily-measured signs.  Research has shown that changes in these signs could mean patients are at increased risk of experiencing a heart attack or dying.

A new study suggests that many early warning scores are based on flawed research. The scores may not be as effective as they are believed to be. This has important implications both in clinical care and for policy makers.

Developments such as electronic care records mean more data is being collected and analysed and could be used in developing more effective early warning scores. This research emphasises that new early warning scores must be supported by research that adheres to high scientific standards.

What’s the issue?

Doctors, nurses and other healthcare professionals use early warning scores to flag patients who are at risk of death, or who need medical intervention, such as admission to intensive care. Used appropriately, the scores could help hospitals focus resources and interventions where they are most needed.

However, scores may not predict deteriorating health as reliably as many believe. This could be due to flaws in their development. In some, the data collected was not reported clearly enough for future researchers to base further research on. In others, the statistics used to analyse the data were inappropriate.

The data and scores were traditionally collected and calculated using observations of vital signs recorded on bedside paper charts. This means the amount of data included in the scores is small and does not allow for more nuanced calculations based on a patient’s individual situation.  Equal weighting is given to all of the vital signs in the scores that are most frequently used in practice, which may not be appropriate.

Early warning scores are used routinely in clinical practice. Flaws in their development could have a huge impact. Conversely so could any improvements.

The National Early Warning Score, (NEWS) 2, is endorsed by NHS England and NHS Improvement. It was introduced in 2017, as a modification of the original NEWS. Novel features are adjusted oxygen saturation scoring thresholds for a subset of patients and a revised consciousness score which includes acute confusion.

NEWS2 was not included in those looked at by the paper. However, it was based on the previous NEWS score which was not derived using robust statistical methods. Validation studies were generally found to use poor methodology, and therefore the performance of NEWS in clinical practice has not been reliably determined.

What’s new?

A systematic review of 95 studies described the development and checking of early warning scores. The researchers found that:

  • the majority of the studies used poor statistical methods to account for missing data.
  • two out of five papers did not report important information such as sample size, sex, age or outcome of participants.
  • all papers were at high risk of bias.

Overall, the researchers argued the quality of the studies was poor. Many failed to report details of the statistical methods used to develop the score. This means that other researchers cannot check their conclusions.

Most of the early warning scores were used to predict death in hospital, a heart attack, or admission to intensive care. However, the papers used a variety of time frames. Some measured events that occurred within 24 hours of the score. Others looked at events within 30 days or at any point during a hospital stay. These longer time frames may not be appropriate since early warning scores are designed to predict outcomes in the next few days.

Few of the papers had followed the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) guidelines on the way data should be reported in this type of research.

Why is this important?

Poorly designed early warning score systems are unlikely to deliver accurate predictions. This means the scores are less likely to prompt doctors and nurses to deliver appropriate interventions for patients in all cases. Overly rigid protocols driven by scores based on flawed research could generate excess work for doctors and nurses, diverting them from delivering care to those who need it the most.

Technological progress means more nuanced and personalised algorithms using more data could be used to predict outcomes for patients. Electronic health records are being increasingly used to record vital signs and calculate early warning scores. It is important that algorithms used by artificial intelligence or machine learning are based on high quality research. Otherwise flaws in earlier systems will be replicated.

What’s next?

Increasing awareness of the poor design of early warning scores, and the pitfalls in many studies, could help researchers develop better scores in future. The researchers say that scores should predict outcomes:

  • that can be prevented by appropriate treatment
  • within a time frame of a few days at most

Sex-specific scores, or those designed for specific groups such as older patients, or those with the same conditions may  be more effective. Widening the data collected and improving the reporting in research could go some way to improving the development of more personalised and effective early warning scores.

Early warning scores are used in every hospital in the country. Even a small improvement in the effectiveness of these scores could have a significant impact.

You may be interested to read

The full paper: Gerry S, and others. Early warning scores for detecting deterioration in adult hospital patients: systematic review and critical appraisal of methodology. BMJ. 2020;369:m1501

Information on the use of early warning scores in the NHS: National Early Warning Score (NEWS), NHS England 


Funding: Lead author Stephen Gerry and one other author Pradeep Virdee are funded by the NIHR Doctoral Research Fellowship. Authors Jacqueline Birks, Peter J Watkinson, Gary S Collins are funded by the NIHR Oxford Biomedical Research Centre.

Conflicts of Interest: The study authors declare no conflicts of interest.

Disclaimer: NIHR Alerts are not a substitute for professional medical advice. They provide information about research which is funded or supported by the NIHR. Please note that views expressed in NIHR Alerts are those of the author(s) and reviewer(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

  • Share via:
  • Print article


Study author

The quality of the papers was low, with few exceptions. Certain statistical problems were common, and there were problems with the way the models for early warning scores were developed. They didn’t include characteristics such as age and sex which might make the models work better.

These scoring systems are fairly basic at the moment and include vital signs like heart rate and temperature. In future, it will be important for scores to be more specific to the patient. They should account for age, sex, and the reason the patient is in hospital. Data needs to be collected on their comorbidities in future studies.

The statistical methods were crude in almost all papers. The way the models were developed was generally poor. These deficiencies may have a big impact on the performance of the model when used in practice. They are used so often in the UK alone, that even a small improvement in these models could make a really big difference overall.

Stephen Gerry, Senior Medical Statistician, Centre for Statistics in Medicine, University of Oxford

Emergency Medicine Consultant

This paper asks us to be cautious in our use and interpretation of early warning scores. It also provides valuable tips for future research. It should impact on health policy researchers, public and professionals.

Unfortunately the whole “science” of predicting outcomes lies too much within a biomedical paradigm. This paper challenges several of our assumptions. It uses robust methodologies for carrying out the review and analysis to minimise its own bias.

Jay Banerjee, Consultant in Emergency Medicine at University Hospitals, Leicester NHS Trust and Honorary Professor of Emergency Care, University of Leicester

Back to top