Chapter 2 Measuring health problems

Take the chapter quiz before and after you read this chapter.

Unit 3: Measures of disease occurrence

Objectives

When you have completed this unit you should be able to:

• Describe the relationship between prevalence and incidence.
• Explain what is meant by “burden of disease”.
• Calculate a disability adjusted life year.
• Explain why public health surveillance systems are important.

3-1 Why is it important to know how common particular diseases are?

It is important to know how common particular diseases are because it allows us to:

• Plan services.
• Monitor whether particular diseases are becoming more or less common.

Knowing how common a disease is helps us to plan services and monitor any change in the frequency of that disease.

3-2 What are the different ways in which the frequency of disease can be measured?

Methods of assessing how often a disease occurs include:

• Measuring prevalence
• Measuring incidence
• Measuring burden of disease
• Public health surveillance

3-3 What is disease prevalence?

Disease prevalence is a way of measuring how many people have a particular disease or condition at any one time. Therefore it is a measure of how common a disease or condition is. For example, the prevalence of diabetes mellitus in the Western Cape is nearly 5%, meaning that at any one time, 5% of the population of the Western Cape can be expected to be diabetic. Prevalence is a simple proportion and is most useful for determining how many people have chronic or long lasting conditions.

Disease prevalence is a measure of how many people have a particular disease at any one time. It is a measure of how common the disease is.

3-4 What is disease incidence?

Disease incidence indicates how many new cases of a particular disease or condition occur in a specified population over a specified time period. Therefore it is a measure of how frequently the disease occurs. For example, nearly 1% of the population of South Africa develops TB every year. The incidence of new cases of TB in South Africa is therefore 1% per year. Incidence is more appropriately used for short-lived conditions, or when it is useful to know how fast new cases are occurring. Incidence can also be used for events that are not diseases, for example 150 sets of twins born in a year.

Disease incidence is a measure of how many new cases of a disease occur in a given period of time.

Note
Small percentages can be difficult to think about. For lower incidence conditions, disease incidence is often expressed as cases per 100, or per 1000, or per 10 000 people per year. For example, a disease incidence of 0.002% (0.002 cases per person per year) is better expressed as 2 cases per 1000 people per year. This might also be called 2 cases per 1000 “person-years”. Incidence is sometimes presented as the number of events in a given period such as 150 sets of twins born in a year.

Prevalence (how common is a condition) depends on both the incidence (number of new cases) and duration of a disease. Prevalence may be high when either there are many new cases, or when people have the disease for a long time.

Incidence and prevalence can be compared to a tap dripping water into a bucket that has a leak. The incidence is the amount of water that enters the bucket (i.e. new cases). The leak is the amount of water that leaves the bucket (i.e. cases are either cured or the person dies). The prevalence is the total amount of water in the bucket (i.e. the number of people with the disease at any one time). As the incidence increases the prevalence usually also increases.

3-6 What is the burden of disease?

If the prevalence and incidence are looked at when planning services, one might end up prioritising conditions that are common but not particularly important. For example the common cold is common but is not an important cause of serious illness. What is needed is a measure that indicates the impact of an illness or health condition on a community or population, and for this the “burden of disease” is measured. The best measure of burden of disease is the Disability Adjusted Life Year (DALY). A related measure called the Quality Adjusted Life Year (QALY) is less often used.

The burden of disease is the impact of an illness or health condition on a community or population.

3-7 What is a DALY?

A DALY (disability adjusted life year) is a measure of the years of healthy life lost due to a disease. It is a measure that combines the years of life that are lost as a result of a condition (years lost because of early death) plus the years of life that are lived with the condition (years of disability). It therefore combines mortality and morbidity. The World Health Organisation (WHO) developed the DALY as part of its Global Burden of Disease Study.

The DALY is intended to measure the burden of disease in a community or population, not in an individual.

Note
Tables have been published where conditions have been given disability “weights” (scores) that reflect the impact of the disability on quality of life, these scores range from 0 (perfectly healthy) to 1 (as bad as being dead). So somebody who lives for 10 years with a condition that has a disability score of 0.3 is said to have lost 3 DALYs (10 x 0.3). However, to estimate the burden of disease, it is best to measure the total number of DALYs lost from a particular condition in a particular population rather than measure the total number of DALYs that an individual person loses. This gives a measure of the burden of disease for that condition in that population.

The DALY combines early death (mortality) and ill health (morbidity) for particular conditions in a population.

3-8 What is a QALY?

The Quality Adjusted Life Year (QALY) is a related measure that was developed several years before the DALY. While DALYs measure health loss, QALYs measure health gains from interventions by combining the years of life gained with the improvement in quality of life for the years lived with a condition.

Note
Calculating QALYs also means that disability weights (scores) must be used, but these are not the same as DALY weights. The published DALY disease weights have all been measured using the same technique. On the other hand, the published disease weights for QALYs have been measured using a variety of techniques by many different investigators. This means that it is more difficult to compare QALYs across different conditions. Therefore DALYs are more commonly used.

DALYs measure health lost and QALYs measure health gained.

3-9 How well do DALYs capture the impact of diseases on communities and populations?

The DALY is the best measure we have. It measures both early death and the effect of living with a disease for an individual. However it is not a perfect measure of the impact of disease on communities and populations. Important criticisms include:

• The DALY does not measure distress and the effect of a disease on family and other community members.
• A single disability weight is given for a condition that might have a very different impact depending where the person lives, what services are available and how fully a person is allowed to participate in normal social activities. For example, paraplegia will be experienced differently in Europe and rural Africa.

3-10 How are DALYs used in practice?

It is a complicated process to calculate the DALYs lost to a community or population. Reliable data are required on the prevalence of various conditions, to know how long people had them for, how old they were when they died, and whether they died of this condition or of something else. In practice, DALYs are estimated across different countries and health regions by the WHO and its partners to produce reports about global health trends, but cannot usually be calculated for health planning on a smaller scale.

3-11 What other methods are used to estimate burden of disease?

Health administrators sometimes just use “years of life lost” as an approximation because this information can easily be found from death certificates. However this measure excludes the “years of life with disability” and therefore does not include the non-fatal disease (disability) burden. As a result it must be interpreted with caution. For example, psychiatric conditions can be very disabling but do not usually cause early death.

Be careful of “years of life lost” reports based only on years of life lost. They do not include the “years of life with disability”.

“Caseload” refers to the number of patients (cases) seen with different conditions at a health facility. This is a measure of demand for services at a facility rather than a true measure of disease occurrence in a community. It is still an important measure because top managers look at the caseload when they make decisions about budget allocation to different facilities. Managers may also track the caseload over time to look at trends in service demands.

Caseload is a measure of the demand for services at a facility and the funding needed for that service.

To measure the caseload, the patient’s discharge diagnosis is allocated a code, usually using the World Health Organisation’s International Classification of Disease (ICD) system. This is a classification of diseases or conditions. It can take a lot of time to look up a code for each patient and therefore it is often not recorded. This means that part of the caseload will not be counted when decisions are made about staffing levels. Fortunately, there are some fast ICD code finders available online and making a paper list of common codes for the clinic can speed up the process.

Coding of the caseload is important to justify staff allocation. There are ways of doing it quickly and efficiently.

Note
The ICD-11 code was released in June 2018.

3-14 What is public health surveillance?

Public health surveillance (monitoring) is the tracking of information about health problems. It has 2 main functions:

• Surveillance can be used to monitor the trends in common health problems, such as HIV prevalence, so that services can be planned.
• Surveillance can be used as an “early warning system” for uncommon diseases that might need fast action such as cholera. Health systems must be alert for infectious diseases that are an epidemic risk so that public health action can be taken early.

Public health surveillance looks for unusual patterns of disease that can alert the authorities to a problem requiring public health action.

3-15 What are the 2 approaches to surveillance?

There are 2 approaches to surveillance:

• Traditional surveillance. This tracks the occurrence of named, diagnosed conditions of importance such as HIV or measles.
• Syndromic surveillance. This tracks the occurrence of symptoms and signs that might suggest an important epidemic, before a diagnosis has been made. An example is sudden onset of muscle weakness, which may suggest polio.

3-16 What approaches are used for traditional surveillance?

Traditional surveillance can be “passive” or “active”:

• Passive surveillance relies on healthcare providers and laboratories submitting information about the occurrence of named diseases that is required by public health laws. Information from passive surveillance tends to be incomplete as paperwork for reporting is not completed.
• Active surveillance involves public health officials contacting people or facilities to ask about and actively search for cases of named diseases. The information from active surveillance tends to be more complete, but it is more expensive to gather.

Traditional surveillance can also be “population based” or “sentinel surveillance”:

• Population based surveillance tries to capture all new cases in a particular population.
• With sentinel surveillance, instead of trying to monitor the entire population, information is collected from a few, selected providers or facilities.

Sentinel surveillance can give more reliable, detailed information from the selected sites, but it may not reflect what is happening elsewhere.

3-17 What approaches are used for traditional surveillance in South Africa?

In South Africa, the following approaches are used for traditional surveillance:

• Notification by clinicians. This is an example of passive surveillance. Clinicians who make a diagnosis of these conditions are expected to complete a GW17/5 form in South Africa, and submit it to their district office. An online reporting system designed in 2010 is still not functional at the time of writing.
• Laboratory based surveillance. In South Africa, the National Institute for Communicable Diseases collects laboratory data on communicable diseases. They have a number of programmes that use active, passive and sentinel surveillance approaches.

The drawback of all traditional surveillance systems is that they only pick up the named diseases they are designed to detect.

All notifiable conditions should be reported.

3-18 Which common conditions in South Africa are notifiable?

Common notifiable conditions include:

• Congenital syphilis
• Food poisoning
• Hepatitis A and B
• Malaria
• Measles
• Meningococcal infection
• Rheumatic fever
• TB

3-19 What are the strategies for syndromic surveillance?

Syndromic surveillance is designed to pick up both the symptoms and signs of diseases that have an epidemic risk. The strategies used in syndromic surveillance are:

• Tracking of pharmacy sales
• School or work absenteeism
• Digital disease surveillance using:
• Posts on social media
• Google Flu Trends monitors the use of certain search terms that might suggest influenza (flu) or a flu-like epidemic

Syndromic reporting systems allow clinicians to report clusters of worrying cases such as suspected polio. This type of syndromic surveillance can only be as effective as the public health system that supports clinicians and acts on reports. Syndromic surveillance systems are not currently widespread globally, but improvement in digital decision support systems and increasing concern about epidemics worldwide mean they are likely to become more widely used.

Case study 1

The prevalence of diabetes is 5% of the general population and the population of your community is 10 000.

1. How many diabetics are likely to live in your community?

About 500 (i.e. 5% of 10 000)

Case study 2

The incidence of HIV has been falling in South Africa since the roll out of antiretroviral treatment. However some newspaper reporters are concerned that the prevalence of HIV infection in South Africa has gone up in the last few years. The national minister of health is asked to explain these findings.

1. Is it good news that the incidence of HIV is falling?

Yes. This indicates that the number of new cases of HIV infection each year is falling.

2. Is it bad news that the prevalence of HIV is climbing?

No. The rising prevalence suggests that people are living with HIV and no longer dying of the condition. As a result the total number of people with HIV is increasing. As the incidence continues to fall with ARVs now available to all people with HIV infection, the national prevalence will continue to increase for many years.

Case study 3

A person lives with bipolar disorder for 20 years and then dies 5 years prematurely as a result of a car accident during a manic episode. The disability score for bipolar disorder is 0.4.

Case study 4

Data analysts produce a table of burden of disease based on years of life lost in a particular province. The top 5 priority conditions based on this table are: HIV/AIDS, homicide (murder), perinatal mortality, motor vehicle accidents and TB. As a result of this, top managers decide to allocate special budgets to infectious diseases, midwifery units and trauma units. Psychiatric disease is nowhere on the top 10 list and receives no increase in budget for the next financial year.

1. If you were the manager of the psychiatric unit which desperately needed more staff and beds, how could you explain why psychiatric disease did not make the top 10 of this list, and why it might still be an important contributor to burden of disease?

Psychiatric disease has a high non-fatal (disability) disease burden that will be missed if top managers only look at “years of life lost”. You will need to make this argument and produce data, such as caseload information, to support your claim.

Case study 5

In the United States, disease surveillance is the responsibility of the Centers for Disease Control and Prevention (the CDC). They produce a Morbidity and Mortality Weekly Report (MMWR) that contains information about specific conditions likely to be of public health interest. On 5 June 1981, the report included the following:

“In the period October 1980 to May 1981, 5 young men, all homosexuals, were treated for biopsy-confirmed Pneumocystis carinii pneumonia at 3 different hospitals in Los Angeles, California. Two of the patients died. All 5 patients had laboratory-confirmed previous or current cytomegalovirus (CMV) infection and candidal mucosal infection. Editor’s Note: Pneumocystis pneumonia in the United States is almost exclusively limited to severely immunosuppressed patients. The occurrence of pneumocystosis in these 5 previously healthy individuals without a clinically apparent underlying immunodeficiency is unusual”.

1. This is the first published report of the disease that later became known as AIDS. The editor’s comment is a good example of which type of surveillance?

The editor’s note is an example of syndromic surveillance.

Case study 6

A doctor works in a very busy district hospital casualty unit. Although the Provincial Health Department has requested that ICD11 codes are allocated for all patients every time they are seen at the unit and has provided a large coding book, he is too busy to find the codes and has asked the admissions clerk to do this. The following year, his request for an additional part-time doctor to help with the patient load is turned down. His manager says that although she knows how busy he is, the top managers could see no evidence of need at the budget meeting and the extra doctor was allocated to another hospital.

1. Why could the request be denied and what can be done to correct it?

The clerk probably did not fill in the ICD codes. They may not have had enough information or training to do so. As far as the top managers were concerned, the uncoded patient visits didn’t happen. The doctor will have to start coding in the present financial year and wait until the next budget meeting, giving his manager sufficient evidence to make the argument for an extra doctor. He can make the coding process much faster and smoother either by using an online code finder, if there is reliable internet access in the casualty unit, or by drawing up a list of the common codes and pinning it to the work station. Coding should ideally be done by clinicians.

Unit 4: Descriptive statistics

Objectives

When you have completed this unit you should be able to:

• Choose the correct form of average for a data set.
• Explain why proportions and rates are more useful than simple numbers in public health.
• Explain what an “outlying” value is in a data set, and how to handle it.
• Detect misleading proportions and rates.

4-1 What are descriptive statistics and how are they used?

Descriptive statistics are used to describe a set of information such as the weights or heights of a group of children. They are also very useful when comparing different sets of data. For example they can be used to compare the cost of services over time and against standards and targets. Commonly used descriptive statistics are:

• Averages
• Proportions
• Rates

Clinicians and managers regularly use averages, proportions and rates to understand changes in the frequency of illness, and the effectiveness of treatment. While they are not complicated, they are often misunderstood and can mislead.

4-2 How should averages be calculated?

Averages can be calculated in 3 different ways that can give quite different values:

• The mean is calculated by adding together a set of values and dividing by the number of values. It is a widely used method for calculating an average, but may not be the best choice when there are “outliers”. An outlier is a value that is much bigger or much smaller than the others. If they are included in a mean, they tend to pull the mean towards the outliers.
• The median is the “middle value”. It is found by placing all the values in order from the largest to the smallest and then identifying the one in the middle. If there is an even number of values, the average of the 2 middle values is used as the median. The median is useful in situations when the average must include outliers that might distort the mean.
• The mode is the most “popular” or commonest value in a data set. It is used when data are not on a numerical scale (are not numbers) but are placed into categories. For example, favourite colour. It would be meaningless to calculate the mean or median of favourite colour. The mode would be the colour chosen by most people. Other examples of these types of “categorical data” in public health are gender, self-assessed health or employment category.

Use the median when outliers may distort the mean. The mode is used for data which are divided into categories.

4-3 When can averages be misleading?

An average is a summary of 2 or more values. It can help us to make comparisons between different sets of data without having to try and make sense of a large number of individual values. As with any summary, important information may be lost and it is important to think about what is being summarised when an average is calculated.

It can be useful to look at the spread of values around the average when one is trying to decide whether or not the findings in a sample of cases represent the “truth” of what would be found if the whole population could be measured. If most of the values are quite close to the mean, the findings in a sample are more likely to be reliable than if the values are widely spread out.

Note
It can be useful to “unpack” an average to see if there are important differences between subgroups in the set of data, for example, different genders, income groups or other population groups.

4-4 How should the spread of values be assessed?

The spread of values can be assessed by making a graph that shows all the values. Using graphs can show:

• How close to the average most of the values are
• Whether an average might be hiding more than one distinct group of values
• Whether there are “outlying” values that may provide interesting information

4-5 What is the standard deviation and what is the range?

The spread of values around the mean or the median can be calculated:

• For the mean, the correct way to represent the spread of values is called the standard deviation. The smaller the standard deviation, the closer the sample values are to the average. With a high standard deviation, the values are widely spread.
• For the median, spread is indicated by the range of values from lowest to highest.
• The mode does not have a spread of values because it is meaningless to describe a spread of values that fall into categories.

The standard deviation is often used to describe how wide the distribution of values is around the mean.

Note
If the distribution of values around the mean is normal (a bell-shaped curve) about 68% of values will fall within one standard deviation, 95% within 2 and over 99% within 3 standard deviations.

4-6 How should “outliers” be handled?

Sometimes people think that outlying values are a nuisance, and so they are “cleaned” (removed) out of the data. In the health sciences, the outliers can sometimes be the most interesting because they are the cases where something has gone wrong, or something has gone right. It can be useful to look at the reasons for outlying cases because there may be useful lessons to be learned from them.

4-7 What is a proportion?

It is often necessary to know how common a disease or a treatment outcome is in a particular community or population. However a simple count of the number of cases will not provide this information. It is also necessary to know how big the population is, so that the numbers of cases in communities or populations of different sizes can be compared.

A proportion is:

Using proportions instead of simple numbers means that the occurrence of diseases or outcomes can be compared between populations of different sizes.

A familiar type of proportion is the percentage (%), which describes how many cases occur per 100 people in the population.

Often in public health, diseases or outcomes are important but not very common. Instead of using very tiny percentages, these proportions are usually described as cases occurring per 1000, 10 000 or 100 000 people.

A percentage is a proportion where the denominator is a hundred.

4-8 What problems occur when using proportions?

Proportions are very widely used in public health to compare health problems and health services. However, they may be inaccurate or misleading. This can happen if the numerator has not been properly counted, but more often it happens because the denominator is inaccurate or has been wrongly chosen.

4-9 What is a rate?

Rates tell us how fast something is happening, either in relation to time or in relation to a life event such as a birth. For example, rates are commonly used in paediatrics to describe how frequently outcomes happen per birth. For example, the neonatal death rate per 1000 deliveries.

Rates also have a numerator and denominator and are subject to the same problems as proportions, i.e. the numerator has not been properly counted or the denominator is inaccurate or has been wrongly chosen.

Note
When a rate is recorded in relation to a life event rather than time, it can also be called a ratio. For example, maternal deaths are most commonly reported as the “maternal mortality ratio”, which is the number of maternal deaths per thousand live births. Here the numerator (maternal deaths) is not the same as the denominator (live births).

4-10 How can misleading proportions and rates be detected?

Proportions and rates are used to compare health statistics between places and over time. If differences are greater than expected, or otherwise surprising, it is important to look critically at how the information was collected and calculated. The following questions can be asked:

• Was the numerator measured in the same way in different places or over different times?
• The way in which conditions are diagnosed may differ. A facility that is more active in looking for particular conditions may report a higher prevalence.
• The way in which activities are recorded may differ. For example, patient visits to the X-ray department may be counted as a separate consultation in Facility A, and as part of the outpatient consultation in Facility B. Therefore even if they are seeing the same number of patients, Facility A will report higher rates of outpatient consultation.
• Was the same denominator used?
• In a proportion or rate, the denominator should be limited to those people who may potentially develop a condition and end up in the numerator. This is sometimes called the “population at risk”. For example, proportions or rates of prostate cancer should only count men in the denominator. Comparing statistics that use different denominators will be misleading.
• Does the denominator reliably count the population of interest?

Often, the denominator is taken from published mid-year population estimates. These rely on an accurate census, and accurate reports of population flows from the Department of Home Affairs.

Case study 1

Some medical students are asked to audit the “average time to be seen” for emergency patients who are given a “red” triage category. They look at the clinical records of “triage red” patients and note the time that ambulance personnel record handover of the patients, together with the time that the first medical notes are recorded. They make a list of “times to be seen” for the 10 “triage red” patients that attended that week. In minutes, these are: 10, 2, 2, 2, 3, 35, 3, 6, 28, 9.

1. Calculate all 3 types of average for this data. Which would you recommend the students use?

The median is probably the best to use because there are 2 outliers (2 patients whose time between ambulance handover and medical notes were 28 and 35 minutes). Those 2 long times “pull” the mean out towards a higher number. The median is not affected to the same extent by the 2 outliers. The mode tells us that more patients were seen within 2 minutes than at any other interval, and is not particularly informative.

2. The students choose the median so that the 2 outliers do not distort the average. What should they do about these outliers?

Outliers should be assessed to see why their values are so different from the others. The students may find that the patients were being actively resuscitated and the doctors did not have time to write their notes earlier (they could check this with the nursing notes). However, it might also be that there were problems with the way these 2 patients were processed and managed and these problems should be identified.

Case study 2

A medical student is asked to find out in which of 2 communities TB is more common. He finds that there are twice as many TB patients in an urban clinic as in a rural clinic and concludes that the problem is twice as big in the urban community.

1. What factors has the student overlooked?

He has looked at the number of cases without assessing the size of the population from which they were drawn. There may be more patients with TB simply because more patients come to that clinic. He can only draw conclusions about the frequency of TB if he compares his numerator with a denominator. He might choose the population size of the community or he might choose the caseload size of the clinic as his denominator.

Case study 3

In a South African community of 20 000 people, you find there are 80 people who use a wheelchair. You wish to compare wheelchair use in your community with use in a high-income country. You find that in the UK, about 2% of the population (or 20 per 1000) uses a wheelchair.

1. What is the proportion of wheelchair users in your community?

80/20 000 = 0.4% (This can also be expressed as 4 per 1000 people).

2. What is the likely reason for the difference in wheelchair use between the South African community and the UK?

It is possible that there are more disabled people in the UK. However, it is more likely that access to wheelchairs is limited in the population you are studying.

Case study 4

At a district health meeting, immunisation rates in children for the previous year are discussed and 2 sub-districts are highlighted for comment. Sub-district A has an immunisation rate of 123% and is praised, while sub-district B has a rate of 67% and is scolded for falling below target. Both are well run sub-districts with committed clinic staff and they have had no supply chain problems. Both sub-districts have a register of all births and the family physician in sub-district B believes immunisations are up to date in nearly all their children.

1. What might explain the discrepancy?

There is likely to be a problem with the way the data has been recorded and/or the proportions calculated because the immunisation rate in sub-district A should not be more than 100%. Problems can arise in the numerator or the denominator.

Immunisation rates are calculated using number of vaccines administered as the numerator and the number of children eligible for immunisation as a denominator – this number is usually taken from the mid-year population estimate provided by Stats-SA. The mid-year population estimate is derived from the census.

Likely numerator issues are: staff in sub-district A may be counting triple vaccines such as DPT or MMR as 3 immunisation, and staff in sub-district B may not be capturing or submitting all the information. Denominator issues are not uncommon if the true size of the population is different from that predicted by the census. Also, in many agricultural communities migrant labour can produce significant population swings throughout the year: perhaps there was an influx of people into sub-district A, or perhaps migrant labourers have moved elsewhere out of sub-district B. We should not be too quick to blame the data and the first step should be to look at the clinical services, but these types of issues are not uncommon.

Case study 5

The head of a hospital obstetric unit is asked to explain why her Caesarean section rate is nearly 60% while other hospitals of the same size in the province have rates of only 17%. She explains that, unlike the other hospitals, there is a nearby midwife obstetric unit (MOU) that deals with all the low risk deliveries and only transfers patients with problems during labour to her ward.

1. Which error was made in calculating her Caesarean section rate?

Senior managers have made a denominator error because they have looked at the number of Caesarean sections per admission to the hospital obstetric unit. The births at the MOU need to be included in the denominator to give the true rate of Caesarean section for this service.