Blog & News
What Are Statistical Tests? Statistical Testing and How to Determine Statistical Significance — A SHADAC Basics Blog
September 06, 2024:Basics Blog Introduction
SHADAC has created a series of “Basics Blogs” to familiarize readers with common terms, concepts, and topics that are frequently covered.
This Basics Blog will focus on the topic of statistical tests & testing—including what statistical testing is, how to do a statistical test (specifically, how to do a t test), related definitions, and how to interpret statistical significance, within the context of data drawn from SHADAC’s State Health Compare.
Keep reading below to learn more about statistical testing and statistical significance.
What Is Statistical Testing?
Statistical testing can tell us if there is enough evidence to determine if an estimate is statistically different from another estimate, within a certain level of confidence.
There are various types of statistical tests that are used in different situations. The test used depends on the type of data, variable types, data distribution, and other factors. Some examples of statistical tests include:
- T-Test
- Z-Test
- ANOVA test
Researchers at SHADAC most often use a “T-Test ” to test for significance in our research and analysis. T-tests are typically used to compare the means or percentages of two independent populations.
Definitions
To get a better understanding of statistical testing and its meaning in data-driven research, we will walk through an example, using a t-test. Then, we can determine statistical significance from the results of the t-test, which we will explain later in this blog. Before that, however, we will need to define some related terms and values that are used in the t-test formula.
The following formula is used to conduct t-tests:
Let’s break down this formula and look at the definitions of each of its parts:
Statistical Estimates: approximation of an unknown true population (ex: a sample).
Statistical estimates are the “best guess” of a true population.
Difference in Estimates: the mathematical difference between the statistical estimates (estimate 1 – estimate 2).
In a t-test, estimates are typically a percentage or mean.
Standard Error (SE): estimate of the (un)certainty and precision of a given estimate.
Smaller standard error = less uncertainty, greater precision
Larger standard error = more uncertainty, less precision
T-Critical Value: a value that is the threshold for significance.
T-critical values are set based on what confidence interval is being used. 1.96 is the t-critical value for tests using a 95% confidence interval, which is commonly used.
T-Statistic Value: used to determine whether a value is significant or not
The t-statistic value is compared to the t-critical value to determine significance.
Margin of Error (MOE): within a specified confidence interval expresses that the true population lies within a range of the provided estimate.
The smaller the margin of error, the greater the precision of the estimate.
Confidence Interval: expresses the interval in which we expect the true population value to fall at a given level of certainty.
The greater the confidence interval (ex: 99% compared to 95%), the greater the certainty. For example, we could say we are 95% confident that a value is significant.
How to Do a T-Test: Using State Health Compare
Now, let’s put all of those together. Using State Health Compare, let’s figure out whether the percentage of adults having a telehealth visit in Illinois in 2020-2021 and 2021-2022 are statistically significant from each other.
In 2020-2021, 31.28% of adults had a telehealth visit. In 2021-2022, 27.92% of adults had a telehealth visit.
Figure 1. Percent of Adults in Illinois Who Had a Telehealth Visit by Year
While we can see that a higher percentage of adults had a telehealth visit in 2020-2021 compared to 2021-2022, this doesn’t necessarily mean that the difference is statistically significant.
To find out if these percentages are statistically significant from each other, within a 95% confidence interval, we will conduct a t-test.
First, we will set our t-critical value, which corresponds to what confidence interval you are using. We will be using a 95% confidence interval, which corresponds to a t-critical value of 1.96.
The data from State Health Compare gives us the percentages for each state, as well as the margins of error (MOE). The MOE for 2020-2021 and 2021-2022 are 0.02021 & 0.02175, respectively.
Using the MOE, we can calculate the standard error (SE). To calculate the SE, divide the MOE by our t-critical value, 1.96.
SE for 2020-2021: 0.02021 / 1.96 = 0.01031
SE for 2021-2022: 0.02175 / 1.96 = 0.01109
We will need these numbers to calculate our t-statistic . However, before we can calculate the t-statistic, we need one more value: the difference in estimates.
This value is calculated by subtracting one estimate from the other. We will use the percentages of telehealth visits from each state.
Difference in Estimates of 2020-2021 & 2021-2022: 0.3128 – 0.2792 = 0.0336
Now, we have all the values needed to calculate our t-statistic, which is the ratio of the difference between the two estimates to the variability of the estimates. In other words, it answers: “How close to (or how far away from) ‘0’ is the difference, given the variability between the two estimates?”.
The t-statistic is calculated by plugging in the numbers from above into this formula:
We can now compare this t-statistic to the t-critical value we determined above, which was 1.96. If the absolute value, which is the value of the number regardless of a positive or negative sign, of the t-statistic value is greater than the t-critical value, we can determine that the difference between the estimates is significant. If the absolute value of the t-statistic is less than the t-critical value, we cannot determine there is statistical significance.
Since 2.22 > 1.96, we can determine, with 95% certainty, that there was a significantly higher rate of telehealth visits in 2020-2021 in Illinois than in 2021-2022.
What This Means & Why It Matters
There are a number of reasons why statistical testing and significance are valuable for researchers and the field of research in general.
Statistical testing is important, as it allows researchers to draw conclusions from the presence or absence of statistical significance, prevents misinterpretation of results, and preserves the integrity of research.
Statistical significance is important because it tells researchers that the true difference between two population means is greater than zero. In other words, we can say with a specified level of confidence that, as a result of testing, two data points or estimates - or populations, in this particular case of assessing the number of adults who had telehealth visits between years – are different from one another.
However, a finding of statistical significance cannot tell us:
- That the difference is meaningful, just that it is different
- That the comparison makes sense analytically
- That the true difference is equal to the difference we observe
For more information on statistical testing and an example of how to determine significance when looking at estimates across different states, please read the accompanying brief at this link or by clicking on the image to the right.
If you’re interested in learning more about statistical analysis, data, and their intersection with health care & insurance coverage, check out some of SHADAC’s other blogs, like:
Blog & News
Race/Ethnicity Data in CMS Medicaid (T-MSIS) Analytic Files: 2021 Data Assessment
December 6, 2023:The Transformed Medicaid Statistical Information System (T-MSIS) is the largest national database of current Medicaid and Children’s Health Insurance Program (CHIP) beneficiary information collected from U.S. states, territories, and the District of Columbia (DC).1 T-MSIS data are critical for monitoring and evaluating the utilization of Medicaid and CHIP, which together provide health insurance coverage to almost 90 million people.2
Due to their size and complexity, T-MSIS data files are challenging to use directly for research and analytic purposes. To optimize these files for health services research, Centers for Medicare and Medicaid Services (CMS) repackages them into a user-friendly, research-ready format called T-MSIS Analytic Files (TAF) Research Identifiable Files (RIF). One such file, the Annual Demographic and Eligibility (DE) file, contains race and ethnicity information for Medicaid and CHIP beneficiaries. This information is vital for assessing enrollment, access to services, and quality of care across racial and ethnic groups in the Medicaid/CHIP population, whose members are particularly vulnerable due to limited income, physical and cognitive disabilities, old age, complex medical conditions, housing insecurity, and other social, economic, behavioral, and health needs.
To guide researchers and other consumers in their use of T-MSIS data, CMS produces data quality assessments of the completeness of race and ethnicity data along with other data such as enrollment, claims, expenditures, and service use. The Data Quality (DQ) assessments for race and ethnicity data have been posted for data years 2014 through 2021 and indicate varying levels of “concern” regarding race and ethnicity data completeness. Some data years have multiple data versions (e.g., Preliminary, Release 1, Release 2), each with their own DQ assessment.
While completeness of race and ethnicity data reported to CMS has historically remained inconsistent among the states, territories, and DC, SHADAC has been monitoring the quality of these data over time. We are excited to discuss a noticeable improvement in quality as discussed below. This blog explores not only the 2021 Data Release 1, the most recent T-MSIS race and ethnicity data for which a DQ assessment is available, but also a brief analysis of data quality trends over time that we plan to follow in future T-MSIS file releases.
Evaluation of T-MSIS Race and Ethnicity Data
DQ assessments for each year and data version of T-MSIS data are housed in the Data Quality Atlas (DQ Atlas), an online evaluation tool developed as a companion to T-MSIS data.3 The DQ Atlas assesses T-MSIS race and ethnicity data using two criteria: the percentage of beneficiaries with missing race and/or ethnicity values in the TAF; and the number of race/ethnicity categories (out of five) that differ by more than ten percentage points between the TAF and American Community Survey (ACS) data. Taken together, these two criteria indicate the level of “concern” (i.e., reliability) for states’ T-MSIS race/ethnicity data. To construct the external ACS benchmark for evaluating T-MSIS data, creators of the DQ Atlas combine race and ethnicity categories in the ACS to mirror race and ethnicity categories reported in the TAF (see Table 1). More information about the evaluation of T-MSIS race and ethnicity data is available in the DQ Atlas’ Background and Methods Resource.
Five “concern” categories appear in the DQ Atlas: Low Concern, Medium Concern, High Concern, Unusable, and Unclassified. States with substantial missing race/ethnicity data or race/ethnicity data that are inconsistent with the ACS – a premier source of demographic data – are grouped into either the High Concern or Unusable categories, whereas states with relatively complete race/ethnicity data or race/ethnicity data that align with ACS estimates are grouped into either the Low Concern or Medium Concern categories. The Unclassified category includes states for which benchmark data are incomplete or unavailable for a given data year and version.
Table 1. Crosswalk of Race and Ethnicity Variables between the TAF and ACS
Race/Ethnicity Category |
Race/Ethnicity Flag Value in TAF |
Combination of Race and Hispanic Variables in ACS |
Hispanic, all races |
7=Hispanic, all races | Hispanic, all races |
Other races, non-Hispanic |
4= American Indian and Alaska Native, non-Hispanic 5=Hawaiian/Pacific Islander 6=Multiracial, non-Hispanic |
- American Indian alone - Alaska Native alone - American Indian and Alaska Native tribes specified; or American Indian or Alaska native, non-specified and no other race - Native Hawaiian and other Pacific Islander alone - Some other race alone - Two or more races |
Source: Medicaid.gov. (n.d.). DQ Atlas: Background and methods resource [PDF file]. Available from https://www.medicaid.gov/dq-atlas/downloads/background-and-methods/TAF-DQ-Race-Ethnicity.pdf Accessed December 1, 2023.
Quality Assessment by State
Table 2 shows the Race and Ethnicity DQ Assessments for the 2021 TAF (Data Version: Release 1). The categorization criteria used to determine the levels of concern for the 2021 TAF Release 1 data are the same as those used to assess T-MSIS data from previous years and versions. 16 states received a rating of “Low Concern.” There were 22 states (including Puerto Rico [PR]) that fell into the “Medium Concern” category.
Most of the “Medium Concern” states (19 of 22) fell into the subcategory denoting the higher percentage range of missing race/ethnicity data (from 10 percent up to 20 percent). A similar pattern can be seen among the “High Concern” states, most of which (8 of 11) fell into the subcategory denoting the highest percentage range of missing race/ethnicity data (from 20 percent up to 50 percent).
Finally, 11 states (including DC) received a rating of “High Concern.” Three states (Massachusetts, Tennessee, and Utah) received an “Unusable” rating, as each of these states was missing at least 50 percent of race/ethnicity data. The Virgin Islands (VI) is the only state/territory categorized as “Unclassified” in the 2021 TAF (Data Version: Release 1) due to insufficient or incomplete data, and does not appear in Table 2.
Table 2. Race and Ethnicity Data Quality Assessment, 2021 T-MSIS Analytic File (TAF) Data Release 1
Data quality assessment |
Percent of beneficiaries with missing race/ethnicity values | Number of race/ethnicity categories where TAF differs from ACS by more than 10% |
Number of states* |
States |
Low Concern | <10% | 0 | 16 | AK, DE, GA, KS, MI, MO, NE, NV, NM, NC, ND, OH, OK, PA, SD, WA |
Medium Concern | <10% | 1 or 2 | 3 | ID, IL, VA |
10% - <20% | 0 or 1 | 19 | AL, AR, CA, CO, FL, IN, KY, ME, MD, MN, MS, MT, NH, NJ, PR, TX, VT, WV, WI | |
High Concern | <10% | 3 or more | 1 | RI |
10% - <20% | 2 or more | 2 | AZ, LA | |
20% - <50% | Any value | 8 | CT, DC, HI, IA, NY, OR, SC, WY | |
Unusable | >50% | Any value | 3 | MA, TN, UT |
Notes: *T-MSIS includes all 50 states, the District of Columbia (DC), and the U.S. territories of Puerto Rico (PR) and the Virgin Islands (VI). However, a DQ assessment is not available for VI in the 2021 TAF (Data Version: Release 1) due to incomplete/unavailable data.
Despite ongoing variation in the completeness of race and ethnicity data reported to CMS, SHADAC researchers have noted a trend toward better quality data overall. Since beginning to track these quality assessments with the 2019 T-MSIS TAF release, a number of states have shifted up the quality assessment scale with noticeably fewer states seeing their data classified as “High Concern.” Specifically, 2021 race/ethnicity TAF data from 11 states received a rating of “High Concern” compared to 16 states’ data in 2020 and 17 states’ data in 2019. The number of states with “Unusable” data has also dropped each year – 3 states’ 2021 race/ethnicity TAF data was classified as “Unusable” compared to 4 states’ data in 2020 and 5 states’ data in 2019.
Visualizing T-MSIS Data in the DQ Atlas
The DQ Atlas enables users to generate maps and tables that compare the quality of T-MSIS data between states across different topics, such as race/ethnicity, age, income, and gender (see Figure 1). Visualizing T-MSIS data in this manner can help researchers quickly assess the completeness of a single variable as well as the relative completeness (or incompleteness) of certain variables compared to others. For example, in the 2021 TAF Data Release 1, all states and territories received a “Low Concern” rating for age data, whereas only 31 states and territories received a “Low Concern” rating for family income.
Figure 1. Data Quality Assessments of Beneficiary Information by U.S. State/Territory
Notes: Green = low concern; yellow = medium concern; orange = high concern; red = unusable; grey = unclassified.
Source: Medicaid.gov. (n.d.). DQ Atlas: Race and Ethnicity [2021 Data set: Version: Release 1]. Available from https://www.medicaid.gov/dq-atlas/landing/topics/single/map?topic=g3m16&tafVersionId=35 Accessed December 1, 2023.
Looking Ahead
Increasingly, a wide diversity of voices from non-profits, health insurers, state-based marketplaces, and policymakers have called for improving the collection of race, ethnicity, and language data, often with the goal of advancing health equity. CMS’s efforts to improve the quality and availability of T-MSIS data reflect this nationwide movement toward data collection practices that more accurately capture the diversity of the U.S. population.
SHADAC was excited to see the revised Office of Management and Budget (OMB) standards related to the collection of race and ethnicity data. The proposed revisions align with available evidence, are consistent with the changes made by leading states, and, most importantly, explicitly state that these standards should serve as a minimum baseline with a call to collect and provide more granular data. However, while these standards are specifically named as minimum reporting categories for data collection throughout the Federal Government, if adopted they are likely to shape data collection and reporting across all sectors, including the states that collect race/ethnicity data through the Medicaid application process.
Many states express difficulties reporting data, as there is misalignment in how state eligibility systems, Medicaid Management Information System (MMIS), and T-MSIS format race and ethnicity data. Before states submit data to T-MSIS, they must reformat and aggregate data, which may affect the quality of submitted data. One approach to improve the collection and reporting of data is providing states with an updated model application using evidence-based approaches to race and ethnicity questions that improve applicant response rate and data accuracy.
Sources
1 Medicaid.gov. Transformed Medicaid Statistical Information System (T-MSIS). Retrieved October 20, 2022, from https://www.medicaid.gov/medicaid/data-systems/macbis/transformed-medicaid-statistical-information-system-t-msis/index.html#
2 Medicaid.gov. August 2023 Medicaid & CHIP Enrollment Data Highlights. Retrieved on December 1, 2023, from https://www.medicaid.gov/medicaid/program-information/medicaid-and-chip-enrollment-data/report-highlights/index.html3 Saunders, H., & Chidambaram, P. (April 28, 2022). Medicaid Administrative Data: Challenges with Race, Ethnicity, and Other Demographic Variables. Kaiser Family Foundation. Retrieved October 31, 2022, from https://www.kff.org/medicaid/issue-brief/medicaid-administrative-data-challenges-with-race-ethnicity-and-other-demographic-variables/
4 Wang, H.L. (June 15, 2022). Biden officials may change how the U.S. defines racial and ethnic groups by 2024. NPR. Retrieved November 1, 2022, from https://www.npr.org/2022/06/15/1105104863/racial-ethnic-categories-omb-directive-15
5 Diaz, J. (August 16, 2022). California becomes the first state to break down Black employee data by lineage. NPR. Retrieved November 1, 2022, from https://www.npr.org/2022/08/16/1117631210/california-becomes-the-first-state-to-break-down-black-employee-data-by-lineage
6 The New York State Senate. (December 22, 2021). Assembly Bill A6896A. Retrieved November 2, 2022, from https://www.nysenate.gov/legislation/bills/2021/A689
Blog & News
SHADAC Responds to Proposed American Community Survey (ACS) Sexual Orientation and Gender Identity (SOGI) Test Questions
December 4, 2023:
View the U.S. Census Bureau's full request for comments in the September 19th edition of the Federal Register. |
On September 19, 2023, the U.S. Census Bureau released a request for comments regarding the proposed addition of test questions regarding sexual orientation and gender identity (SOGI) for the American Community Survey (ACS). According to the notice in the Federal Register, the Census Bureau specifically hopes to test question wording, response categories, and placement within the survey itself.
Researchers at SHADAC reviewed the proposed test questions included in the Register proposal, as well as the methodology and reasoning behind the Census Bureau’s choices, and responded with comments regarding the measurement of sex and gender identity. Specifically, researchers discuss the limitations of the two-step gender identity questions, language and inclusivity concerns, and recommendations for a more streamlined and accessible two-step question format.
SHADAC’s Comments on Measuring Sex and Gender Identity
When designing survey questions, the consumer experience is paramount. Maximizing the accessibility and acceptability of question language improves data quality in multiple dimensions, including item non-response, misclassifications, and overall response rates.
In the case of measuring sex and gender identity, context matters. It is important to acknowledge how these questions might differ in various settings - when asked on a survey compared to when asked in an administrative or clinical setting, for example. We are concerned that the ACS is missing a key opportunity to update questions on sex and gender in ways that both enhance user experience and are specific to the survey setting.
The test questions for sex and gender identity as proposed use overly academic language that is better suited for a clinical setting, by asking first ‘what sex was NAME assigned at birth’ followed by ‘current gender identity.’ While such questions have utility, such as for verification of specific health insurance benefits, this approach is not optimal for a population survey such as the ACS.
Unnecessary jargon makes questions less accessible for respondents with lower literacy levels or who are non-English speaking and adds to the cognitive burden for all respondents. Survey language should minimize the respondent burden in order to support data quality and user experience. The limitations of the proposed two-step gender identity question have been described by the National Academies of Sciences, Engineering, and Medicine (NASEM).
Specific concerns worth highlighting are:
1) The proposed response options for ‘current gender’ are not inclusive of transgender experiences because these options imply that transgender is a tertiary or ‘other’ category and mutually exclusive from male or female identities. Allowing for multiple answers (one of the proposed test options) does not address this conceptual limitation.
2) Asking chronologically about ‘sex assigned at birth’ followed by ‘current gender identity’ may be perceived as invasive and/or invalidating for transgender respondents, which could increase item nonresponse for this critical population.
3) Asking a third question for verification of gender status when a respondent’s answers to ‘sex assigned at birth’ and ‘current gender’ don’t match places an undue burden on the transgender and nonbinary population. At minimum, the testing process should assess false positive rates and seek to avoid unnecessarily burdensome questioning of transgender and nonbinary people.
4) ‘Sex assigned at birth’ is not inclusive of intersex or nonbinary designations on infant birth certificates. These situations are increasingly common, and the current wording could lead to false positives for transgender, along with unnecessarily invasive questioning among individuals born with intersex traits.
SHADAC recommends that the Census consider a more streamlined two-step question approach that gathers the same information (sex assigned at birth and current gender) while providing a more inclusive and accessible experience for respondents. Specifically, we recommend asking first ‘what is your gender’ followed by ‘are you transgender.’ This approach was developed in Oregon via extensive stakeholder engagement. Similar language has also been used by administrators to update population survey questions in Minnesota.
The alternative two-step question addresses the limitations described above in the following ways:
1) Response options for ‘gender’ should include male, female, nonbinary, and a write-in response option. Asking about transgender identity in a separate question avoids portraying transgender as mutually exclusive with male or female. For respondents who need an explanation for ‘transgender,’ a hover box or an interviewer can provide a definition as follows: ‘Transgender describes a person whose gender identity differs from their sex assigned at birth.’
2) Asking simply about ‘gender’ first is clear and inclusive. Avoiding the ‘sex assigned at birth’ initial question would be less duplicative and more accessible for many respondents.
3) Asking directly about transgender identity (with ‘yes/no’ response options) prioritizes accessible language to minimize respondent burden and may eliminate the need for additional verification for transgender respondents. Ethically, the ACS should avoid asking all transgender respondents for extra verification without strong data to indicate that doing otherwise would lead to significantly elevated false positive rates.
4) Not asking about ‘sex assigned at birth’ avoids unnecessary collection of personal health data. This supports privacy for all respondents. Additionally, this approach could help reduce item nonresponse and false positives among intersex individuals as well as cisgender respondents who are unfamiliar with and/or dislike the language and concepts in the initially proposed test questions.
Thank you for your consideration. We know that the Census Bureau faces many important decisions and appreciate the chance to share our feedback on this important content test.
Publication
Health Equity Measurement: Considerations for Selecting a Benchmark (SHVS Brief)
The following content is cross-posted from State Health & Value Strategies.
Authors: Emily Zylla, Andrea Stewart, and Elizabeth Lukanen, SHADAC
As states look to advance health equity, they need ways to measure whether their efforts result in improvements. Benchmarking can be used to identify health disparities and establish a standard for evaluating efforts to address health inequities.
This issue brief summarizes the advantages and disadvantages of four common approaches to health equity benchmarking:
1) Using the best-performing group as a reference
2) Using the most socially advantaged group as a reference
3) Comparing against a population average
4) Comparing against a set target or goal
Key Findings
- There is no single ideal benchmark for health equity measurement and it is important to weigh the advantages and disadvantages of each before selecting an approach.
- The rationale for selecting a benchmarking approach should be thoroughly explained and accompanied by detailed context and interpretation, acknowledging the role of societal inequality and structural racism in driving disparities to prevent the perception that individual subgroups carry responsibility for the observed disparities.
Conclusion
There is no universal “best” approach to selecting a benchmark for health equity measurement. Ultimately, careful and detailed documentation of benchmarking methodology and other choices made in health equity measurement, paired with a discussion of the root causes of inequities, connects the dots between disparate outcomes and the disparities in power and privilege in which they are rooted to maintain the focus on the goal of advancing health equity.
To read the brief in its entirety click here.
About the Author/Grantee:
State Health and Value Strategies (SHVS) assists states in their efforts to transform health and healthcare by providing targeted technical assistance to state officials and agencies. The program is a grantee of the Robert Wood Johnson Foundation, led by staff at Princeton University’s School of Public and International Affairs. The program connects states with experts and peers to undertake healthcare transformation initiatives. By engaging state officials, the program provides lessons learned, highlights successful strategies and brings together states with experts in the field. Learn more at www.shvs.org.
This issue brief was prepared by Emily Zylla, Andrea Stewart, and Elizabeth Lukanen. The State Health Access Data Assistance Center (SHADAC) is an independent, multi-disciplinary health policy research center housed in the School of Public Health at the University of Minnesota with a focus on state policy. SHADAC produces rigorous, policy-driven analyses and translates its complex research findings into actionable information for states. Learn more at www.shadac.org.