Why Different Datasets Tell Different Stories: What This Means for Understanding Small Firms in the U.S.
- Tracy Cole
- Oct 27
- 5 min read
The landscape for small firm data in the US
Commensurate with their importance in the US economy, there are many data sources at the national, state, and local levels examining small firms. Whether you’re looking for data on jobs, new business creation, or startup survival, there are a number of places you can find information.*
While more data is generally a good thing, it can be overwhelming to try to navigate the myriad datasets and statistics and make sense of the different “languages” they speak. Different datasets use varying definitions and collection methods, and often serve different purposes. (For example, the Census Bureau’s Annual Business Survey focuses on business characteristics and owner demographics, while the Bureau’s Business Dynamics Statistics reports on business creation, closure, expansions, and contractions.) As a result, there are still gaps in our understanding of many aspects of small firms. Numbers that seem comparable at first glance can tell very different stories, and we should be cautious about treating different datasets as equivalent.
We see this play out most clearly in the way different datasets report on how firms are created, how long they survive, when they close, and how many jobs they generate. Numbers on these topics can vary widely depending on how data sets are created and used. This is especially true when big, comprehensive datasets are mixed with lots of surveys that differ in who they include, how they ask questions, and what they measure, not to mention the added differences between self-reported and administrative data, or between employer and nonemployer firms.
The Comprehensive Startup Panel: An example of how data can be improved
Robert Fairlie, an economist at UCLA, together with his colleagues Zachary Kroff, Javier Miranda, and Nikolas Zolas, recognized the limitations of existing datasets in understanding startup dynamics, job creation, and survival. For example:
Many datasets only report job creation by reference to new employer firms (i.e. those that immediately hire employees) excluding nonemployer firms that grow into employer firms over time.
Similarly, data on firm survival often exclude nonemployer firms, despite many of them closing down before ever hiring an employee, leading to under-estimation of overall firm failures.
These gaps lead to overly optimistic summaries of startup success, survival, and job creation, because many businesses never make it to the point of hiring, don’t survive long, or grow very slowly. These firms’ existence (and failure) is invisible in official numbers.
To address these data gaps, Fairlie and his colleagues developed the Comprehensive Startup Panel (CSP) which tracks both nonemployer and employer firms over time, covering the years 1996 to 2018.**
The findings on startups within the CSP differ significantly from earlier, more limited datasets. While the Small Business Administration often cites 50% survival of new startups after 5 years, the CSP finds that startup survival is much lower when you include nonemployer firms and follow all cohorts over time. The CSP finds that 59% of startups survive after one year, 47% after 2 years, and 33% after 5 years.

Job creation in the CSP is also more modest than previous estimates, especially after the first years of operation. While the Small Business Administration suggests that 6 new jobs are created per startup, the CSP finds that the average entrepreneur creates 0.74 jobs in the first year after startup, 0.63 workers after five years, and 0.57 workers after seven years.
The transition rate from nonemployer to employer status that the CSP incorporates is an important component of aggregate job growth over time; nonemployer startups that transition to employer firms create an average of nearly 320,000 jobs seven years after startup, representing one-seventh of the total 2.3 million jobs created by all startups. This data is missing from datasets that cleanly divide businesses into employers or non employers.
The SFD USA approach
The Comprehensive Startup Panel data demonstrates just one way that the existing data on small firms in the U.S. can be improved. The creation of this new dataset recognizes that businesses do change form and shows us that we can’t truly understand small firms unless we track them across those changes. It also shows us the importance of examining the differences in how small firm data is created and factoring this into the way we understand and use the data.
Through high frequency data collection and detailed surveys, the Small Firm Diaries USA allows us to question definitions and probe underlying assumptions that existing datasets may take for granted. The insights gained through SFD USA will complement existing datasets by adding nuance and depth to the way we approach and understand small firms.
Appendix: Major datasets and their methodologies
The table below gives a closer look at common, national-level datasets, demonstrating the different sources of information available:
*See Appendix for a snapshot of the large, national datasets that exist and their different methodologies.
**This is done by linking two long-standing Census Bureau panels: the Longitudinal Business Database (LBD) (which tracks employer establishments over time) and the Integrated Longitudinal Business Database (ILBD) (which tracks nonemployer firms)
***The Longitudinal Business Database tracks all U.S. employer establishments over time by linking annual records to provide a history of each business, allowing for the analysis of changes in employment, payroll, and other business statistics.