Blog & News
What Are Statistical Tests? Statistical Testing and How to Determine Statistical Significance — A SHADAC Basics Blog
September 06, 2024:Basics Blog Introduction
SHADAC has created a series of “Basics Blogs” to familiarize readers with common terms, concepts, and topics that are frequently covered.
This Basics Blog will focus on the topic of statistical tests & testing—including what statistical testing is, how to do a statistical test (specifically, how to do a t test), related definitions, and how to interpret statistical significance, within the context of data drawn from SHADAC’s State Health Compare.
Keep reading below to learn more about statistical testing and statistical significance.
What Is Statistical Testing?
Statistical testing can tell us if there is enough evidence to determine if an estimate is statistically different from another estimate, within a certain level of confidence.
There are various types of statistical tests that are used in different situations. The test used depends on the type of data, variable types, data distribution, and other factors. Some examples of statistical tests include:
- T-Test
- Z-Test
- ANOVA test
Researchers at SHADAC most often use a “T-Test ” to test for significance in our research and analysis. T-tests are typically used to compare the means or percentages of two independent populations.
Definitions
To get a better understanding of statistical testing and its meaning in data-driven research, we will walk through an example, using a t-test. Then, we can determine statistical significance from the results of the t-test, which we will explain later in this blog. Before that, however, we will need to define some related terms and values that are used in the t-test formula.
The following formula is used to conduct t-tests:
Let’s break down this formula and look at the definitions of each of its parts:
Statistical Estimates: approximation of an unknown true population (ex: a sample).
Statistical estimates are the “best guess” of a true population.
Difference in Estimates: the mathematical difference between the statistical estimates (estimate 1 – estimate 2).
In a t-test, estimates are typically a percentage or mean.
Standard Error (SE): estimate of the (un)certainty and precision of a given estimate.
Smaller standard error = less uncertainty, greater precision
Larger standard error = more uncertainty, less precision
T-Critical Value: a value that is the threshold for significance.
T-critical values are set based on what confidence interval is being used. 1.96 is the t-critical value for tests using a 95% confidence interval, which is commonly used.
T-Statistic Value: used to determine whether a value is significant or not
The t-statistic value is compared to the t-critical value to determine significance.
Margin of Error (MOE): within a specified confidence interval expresses that the true population lies within a range of the provided estimate.
The smaller the margin of error, the greater the precision of the estimate.
Confidence Interval: expresses the interval in which we expect the true population value to fall at a given level of certainty.
The greater the confidence interval (ex: 99% compared to 95%), the greater the certainty. For example, we could say we are 95% confident that a value is significant.
How to Do a T-Test: Using State Health Compare
Now, let’s put all of those together. Using State Health Compare, let’s figure out whether the percentage of adults having a telehealth visit in Illinois in 2020-2021 and 2021-2022 are statistically significant from each other.
In 2020-2021, 31.28% of adults had a telehealth visit. In 2021-2022, 27.92% of adults had a telehealth visit.
Figure 1. Percent of Adults in Illinois Who Had a Telehealth Visit by Year
While we can see that a higher percentage of adults had a telehealth visit in 2020-2021 compared to 2021-2022, this doesn’t necessarily mean that the difference is statistically significant.
To find out if these percentages are statistically significant from each other, within a 95% confidence interval, we will conduct a t-test.
First, we will set our t-critical value, which corresponds to what confidence interval you are using. We will be using a 95% confidence interval, which corresponds to a t-critical value of 1.96.
The data from State Health Compare gives us the percentages for each state, as well as the margins of error (MOE). The MOE for 2020-2021 and 2021-2022 are 0.02021 & 0.02175, respectively.
Using the MOE, we can calculate the standard error (SE). To calculate the SE, divide the MOE by our t-critical value, 1.96.
SE for 2020-2021: 0.02021 / 1.96 = 0.01031
SE for 2021-2022: 0.02175 / 1.96 = 0.01109
We will need these numbers to calculate our t-statistic . However, before we can calculate the t-statistic, we need one more value: the difference in estimates.
This value is calculated by subtracting one estimate from the other. We will use the percentages of telehealth visits from each state.
Difference in Estimates of 2020-2021 & 2021-2022: 0.3128 – 0.2792 = 0.0336
Now, we have all the values needed to calculate our t-statistic, which is the ratio of the difference between the two estimates to the variability of the estimates. In other words, it answers: “How close to (or how far away from) ‘0’ is the difference, given the variability between the two estimates?”.
The t-statistic is calculated by plugging in the numbers from above into this formula:
We can now compare this t-statistic to the t-critical value we determined above, which was 1.96. If the absolute value, which is the value of the number regardless of a positive or negative sign, of the t-statistic value is greater than the t-critical value, we can determine that the difference between the estimates is significant. If the absolute value of the t-statistic is less than the t-critical value, we cannot determine there is statistical significance.
Since 2.22 > 1.96, we can determine, with 95% certainty, that there was a significantly higher rate of telehealth visits in 2020-2021 in Illinois than in 2021-2022.
What This Means & Why It Matters
There are a number of reasons why statistical testing and significance are valuable for researchers and the field of research in general.
Statistical testing is important, as it allows researchers to draw conclusions from the presence or absence of statistical significance, prevents misinterpretation of results, and preserves the integrity of research.
Statistical significance is important because it tells researchers that the true difference between two population means is greater than zero. In other words, we can say with a specified level of confidence that, as a result of testing, two data points or estimates - or populations, in this particular case of assessing the number of adults who had telehealth visits between years – are different from one another.
However, a finding of statistical significance cannot tell us:
- That the difference is meaningful, just that it is different
- That the comparison makes sense analytically
- That the true difference is equal to the difference we observe
For more information on statistical testing and an example of how to determine significance when looking at estimates across different states, please read the accompanying brief at this link or by clicking on the image to the right.
If you’re interested in learning more about statistical analysis, data, and their intersection with health care & insurance coverage, check out some of SHADAC’s other blogs, like:
Blog & News
Survey Data Season Essentials: What Is the BRFSS and How Can Researchers Use It?
August 29, 2024:
This post is a part of our Survey Data Season series where we examine data from various surveys that are released annually from the summer through early fall. Find all of the Survey Data Season series posts on our Survey Data Season 2024 page here.
Each year, SHADAC covers the data releases of multiple federal surveys from a variety of agencies, beginning with the National Health Interview Survey (NHIS) in June continuing through the release of American Community Survey (ACS) and Current Population Survey (CPS) data products in September through January.
While our focus has traditionally been on the health insurance coverage data that found in these surveys, we have also looked at factors related to coverage, including ‘access to care’ via measures of adults without primary doctors, for example, and ‘cost of care’ via measures such as adults who forgo needed medical care (because of cost). Both of those measures come from the BRFSS, a survey that is both part of our overall Survey Data Season coverage and is used in our annual “Comparing Federal Surveys that Count the Uninsured” brief, but is not typically used as our main source of data when analyzing health insurance coverage.*
This blog post will provide an overview of the BRFSS, answer some common questions about this survey, walk through a few examples of how we at SHADAC use BRFSS data, and review how other researchers and analysts can use it, too.
What Does “BRFSS” Stand For?
BRFSS stands for Behavioral Risk Factor Surveillance System. “BRFSS” can be treated as either an acronym and pronounced “BUR-fiss” or an initialism with each letter read out individually.
Which Federal Agency Conducts the BRFSS?
Conducted since 1984, the BRFSS is a partnership between the Centers for Disease Control (CDC) and state health departments in U.S. states and territories, which are responsible for data collection in their area.
How Are BRFSS Data Collected?
BRFSS is an annual, telephone-based survey of U.S. adults (18 years or older) that calls landlines and cell phones via random digit dialing.
The survey questionnaire has three parts:
- The core component, which includes demographic questions and asks about health-related perceptions, conditions, and behaviors. The core is composed of a “standard” core of questions that states ask every year, and also includes a “rotating” core of alternating and emerging content (e.g., questions related to COVID were added in 2021) that is asked in even and odd years.
- The optional modules, which focus on specific health conditions and additional risk factors and that states can optionally choose to ask.
- The state-added questions, which are additional questions added by individual states to their own questionnaire that can be used to learn more about a specific topic, or can be used to oversample groups of interest.
All participating states and territories (50 states, District of Columbia, Guam, the Commonwealth of Puerto Rico, and the U.S. Virgin Islands) are required to ask the survey questions contained in the core component, but are not required to ask anything from the optional or state-added question sections. This can make it difficult to compare data across all states or analyze trends over time, depending on the health topic area of interest.
What Kind of Information Is Found in the BRFSS?
BRFSS collects state-level data about adult “health-related risk behaviors and events, chronic health conditions, and use of preventive services.”
When Does the BRFSS Release Data?
Data from the BRFSS is released annually, but the exact timeframe can vary. Unlike the American Community Survey (ACS) and Current Population Survey (CPS) - which are scheduled to come out on September 10 and September 12 this year, respectively - the BRFSS does not release data on a specific date. Rather, data usually come out in August, but can be released anytime from July to September. Researchers can check the BRFSS Annual Survey Data page to see when the data becomes available.
This year's data released on August 29th - you can find that here.
Can You Pool or Combine Multiple Years of BRFSS Data?
BRFSS data can generally be pooled (i.e., combined) across multiple years of data. This is usually done to increase both sample sizes and the statistical reliability of estimates for small populations or uncommon events.
Survey questions, measures, and variable names can change across data years, though, so care should be taken to appropriately harmonize variables when pooling multiple years of BRFSS data. For example, BRFSS’ usual source of care measure changed substantially in 2021, creating a “break in series” in that year. Survey weights can generally be divided by the number of pooled data years to produce accurate weighted counts.
Important to note for combining multiple years of BRFSS data specifically: There was a break in the BRFSS series in 2011 when the survey began including cell phones in addition to landlines. Thus, data before 2011 should not be pooled with data from 2011 forward.
How Does SHADAC Use BRFSS Data?
SHADAC draws on data from the BRFSS to produce key resources like our Comparing Federal Government Surveys That Count the Uninsured, which looks at trends in rates of uninsurance across time and across five different federal surveys, helping researchers understand how and when to most appropriately use each one.
The BRFSS is a rich source of 50-state data on health outcomes and behaviors, and thus offers researchers and analysts a venue for tracking the effects of health reforms for persons of varying income levels. Since the BRFSS income categories do not always match those set by the federal government for calculating poverty levels, SHADAC produced a helpful brief detailing Four Methods for Calculating Income as a Percent of the Federal Poverty Guidelines (FPG) in the Behavioral Risk Factor Surveillance System (BRFSS) in order to assist with this analytic process.
Choosing Between Federal Surveys That Measure Rent Affordability is another resource produced by SHADAC researchers that uses data from multiple federal surveys, including the BRFSS, to help researchers better understand ways to measure the effect of a rising housing affordability crisis across varying populations and across states in the U.S. We also use a variety of data from BRFSS measures on our State Health Compare tool.
Take a look at the BRFSS measures that are available on our State Health Compare site!
Access, Cost, and Quality of Care
Adults Who Forgo Needed Medical Care contains estimates of adults who could not get needed medical care due to cost. Uniquely for this measure, our demographic subgroups are organized not only by racial and ethnic categories and educational attainment, but also by chronic disease status. See more on this measure in this recent resource.
Adults Who Have No Personal Doctor is a measure of adults who do not have a doctor or care provider that they regularly interact with or schedule visits with for routine health care. SHADAC pulls this measure from the BRFSS and has used it in several past analyses, including this blog.
Adult Cancer Screenings provides users with the rate of adults receiving recommended cancer screenings, including pap smears (cervical cancer screening), colorectal cancer screenings, and mammograms (breast cancer screening). Check out this recent SHADAC blog for more detail on this measure and analysis by available demographics.
Adult Flu Vaccinations are a gauge for adults in the United States who received their annual vaccine to protect against influenza. Tracking flu vaccination rates can help in estimating broader vaccination trends - such as in this analysis - and help detect gaps in vaccination coverage across different demographic groups.
Health Behaviors and Outcomes
Adult Excessive Alcohol Consumption is newer measure on state health compare that provides the rate of adult excessive alcohol consumption, which in turn is defined as binge drinking (4 or more drinks for women or 5 or more drinks for men on one occasion) and/or heavy drinking (7 or more drinks per week for women or 15 or more drinks per week for men).
Adult Smoking and Adult E-Cigarette Use are two related measures that track annual state-level rates of adults who smoke traditional tobacco cigarettes and those who smoke e-cigarettes, respectively. SHADAC researchers used this data from the BRFSS to produce a series of blogs on health behaviors, including this one.
Chronic Disease Prevalence looks at the percent of the adult population who report having one or more of the following specific chronic disease types: diabetes, cardiovascular disease (CVD), heart attack, stroke and asthma.
Adult Unhealthy Days is a self-reported measure of the number of days within a month (30 days) that an individual does not have good health, either mentally or physically. Recently, SHADAC produced a 50-state resource which takes a closer look at this measure.
Activities Limited due to Health Difficulty provides a look at rates on the average number of days in the past 30 days that a person reports limited activity due to mental or physical health difficulties.
Stay Updated on the BRFSS Data Release and More with SHADAC’s Survey Data Season Series
We hope that this blog helps you to better understand what the BRFSS is, what kinds of information we can get from the data, and how SHADAC and other researchers can use BRFSS data to understand health care use, cost, quality, and access in each of the states and the nation as a whole.
But Survey Data Season doesn’t stop with the BRFSS. Through September, SHADAC will be covering the release of various major survey data from important federal survey sources, including the NHIS, MEPS, ACS, CPS, and, of course, the BRFSS. Stay up to date on our Survey Data Season series, with more Essentials blogs like this one along with other products analyzing newly released data, by signing up for our newsletter and following us on LinkedIn.
Want to see what we’ve already made for our Survey Data Season series? Check out the Survey Data Season archive page for a full list of everything we’ve created so far, including a blog on recently released 2023 NHIS data. Check back often for updates and new additions.
Notes
*The BRFSS has included a question about current health insurance coverage within the standard core component since 1991. However, the question simply asked whether the respondent had coverage, and not about the type of health insurance coverage a respondent might have.
Question: “Do you have any kind of health care coverage, including health insurance, prepaid plans such as HMOs, or government plans such as Medicare, or Indian Health Service?”
Answers: Yes; No; Don’t Know/Not Sure; Refused
Recently, in 2021, the BRFSS added a primary source of coverage question, making it possible to understand what portions of the national and state populations have different types of coverage.
Question: “What is the current primary source of your health insurance?”
Answers: A plan purchased through an employer or union (including plans purchased through another person's employer); A private nongovernmental plan that you or another family member buys on your own; Medicare; Medigap; Medicaid; Children's Health Insurance Program (CHIP); Military related health care: TRICARE (CHAMPUS) / VA health care / CHAMP- VA; Indian Health Service; State sponsored health plan; Other government program; No coverage of any type; Don’t Know/Not Sure; Refused
BRFSS’ measure of health insurance coverage is substantially different from those found in other federal surveys (e.g., ACS, CPS or NHIS) that allow respondents to choose multiple sources of health insurance coverage, rather than requiring respondents to select one “primary” source of coverage. Though BRFSS coverage data are generally reliable, by preventing respondents from selecting multiple sources of coverage, they present a less nuanced picture of health insurance compared to other surveys. Further, because BRFSS only surveys adults and has lower response rates than other surveys, other surveys are typically better sources of information about health insurance coverage, per se.
Publication
New Brief from SHADAC and UMN Cannabis Research Center: Using the Minnesota Student Survey to understand cannabis use and perceptions among high school students
As of 2023, Minnesota legalized cannabis for non-medical use by adults, becoming one of the now 24 states with such policies. Minnesota’s cannabis legislation limits legal cannabis use to adults aged 21 and older, similar to other states’ legislation for cannabis, tobacco, and alcohol use as well.
Despite the legislation prescribing this age restriction, many policymakers and other stakeholders are concerned with the impact legalized cannabis could have on public health in general and on youth populations, specifically.
“As cannabis policy continues to develop in Minnesota, and once legal sales of cannabis begin, it will be crucial to study youth cannabis use and to use those findings to fine-tune the state’s approach to minimize public health risks for youth,” says SHADAC and Cannabis Research Center (CRC) researcher Colin Planalp.
Commercial sales of legal cannabis in Minnesota have not yet begun, and they are not expected until 2025. However, the Cannabis Research Center (CRC) and SHADAC — both centers housed at the University of Minnesota’s School of Public Health — have already begun work to understand potential effects this legislation could have on youth. In fact, researchers purposefully wanted to start research and use data from before the beginning of commercial sales in order to provide key baseline evidence on cannabis use and perceptions among youth.
Thus, in their newest brief, researchers from the CRC and SHADAC used data from the Minnesota Student Survey (MSS) to study youth experiences with and perceptions of cannabis and other substances prior to legalization. This brief specifically looks at:
- Self-reported cannabis use
- Differences between demographic groups’ cannabis use
- Perceived prevalence of cannabis use by peers
You can read the brief in full here or by clicking the image to the right.
Interested in Minnesota cannabis policy? Want to learn more about the potential public health implications of cannabis use and legalization? Check out some of SHADAC and the CRC’s other collaborative pieces: