top of page

Why Different Datasets Tell Different Stories: What This Means for Understanding Small Firms in the U.S.

  • Tracy Cole
  • Oct 27
  • 5 min read

The landscape for small firm data in the US 

Commensurate with their importance in the US economy, there are many data sources at the national, state, and local levels examining small firms. Whether you’re looking for data on jobs, new business creation, or startup survival, there are a number of places you can find information.*


While more data is generally a good thing, it can be overwhelming to try to navigate the myriad datasets and statistics and make sense of the different “languages” they speak. Different datasets use varying definitions and collection methods, and often serve different purposes. (For example, the Census Bureau’s Annual Business Survey focuses on business characteristics and owner demographics, while the Bureau’s Business Dynamics Statistics reports on business creation, closure, expansions, and contractions.) As a result, there are still gaps in our understanding of many aspects of small firms. Numbers that seem comparable at first glance can tell very different stories, and we should be cautious about treating different datasets as equivalent. 


We see this play out most clearly in the way different datasets report on how firms are created, how long they survive, when they close, and how many jobs they generate. Numbers on these topics can vary widely depending on how data sets are created and used. This is especially true when big, comprehensive datasets are mixed with lots of surveys that differ in who they include, how they ask questions, and what they measure, not to mention the added differences between self-reported and administrative data, or between employer and nonemployer firms.


The Comprehensive Startup Panel: An example of how data can be improved

Robert Fairlie, an economist at UCLA, together with his colleagues Zachary Kroff, Javier Miranda, and Nikolas Zolas, recognized the limitations of existing datasets in understanding startup dynamics, job creation, and survival. For example:

  • Many datasets only report job creation by reference to new employer firms (i.e. those that immediately hire employees) excluding nonemployer firms that grow into employer firms over time.

  • Similarly, data on firm survival often exclude nonemployer firms, despite many of them closing down before ever hiring an employee, leading to under-estimation of overall firm failures.


These gaps lead to overly optimistic summaries of startup success, survival, and job creation, because many businesses never make it to the point of hiring, don’t survive long, or grow very slowly. These firms’ existence (and failure) is invisible in official numbers.


To address these data gaps, Fairlie and his colleagues developed the Comprehensive Startup Panel (CSP) which tracks both nonemployer and employer firms over time, covering the years 1996 to 2018.** 


The findings on startups within the CSP differ significantly from earlier, more limited datasets. While the Small Business Administration often cites 50% survival of new startups after 5 years, the CSP finds that startup survival is much lower when you include nonemployer firms and follow all cohorts over time. The CSP finds that 59% of startups survive after one year, 47% after 2 years, and 33% after 5 years. 


ree


Job creation in the CSP is also more modest than previous estimates, especially after the first years of operation. While the Small Business Administration suggests that 6 new jobs are created per startup, the CSP finds that the average entrepreneur creates 0.74 jobs in the first year after startup, 0.63 workers after five years, and 0.57 workers after seven years.


The transition rate from nonemployer to employer status that the CSP incorporates is an important component of aggregate job growth over time; nonemployer startups that transition to employer firms create an average of nearly 320,000 jobs seven years after startup, representing one-seventh of the total 2.3 million jobs created by all startups. This data is missing from datasets that cleanly divide businesses into employers or non employers.


The SFD USA approach

The Comprehensive Startup Panel data demonstrates just one way that the existing data on small firms in the U.S. can be improved. The creation of this new dataset recognizes that businesses do change form and shows us that we can’t truly understand small firms unless we track them across those changes. It also shows us the importance of examining the differences in how small firm data is created and factoring this into the way we understand and use the data.


Through high frequency data collection and detailed surveys, the Small Firm Diaries USA allows us to question definitions and probe underlying assumptions that existing datasets may take for granted. The insights gained through SFD USA will complement existing datasets by adding nuance and depth to the way we approach and understand small firms. 



Appendix: Major datasets and their methodologies

The table below gives a closer look at common, national-level datasets, demonstrating the different sources of information available:

Dataset

What it covers

Methodology

Insights

Limitations

Business creation, closure, expansions, and contractions

Draws from the Longitudinal Business Database***, covering employer firms. Tracks establishments and their employment over time

Useful for analyzing startup survival, firm dynamics, and job creation

Excludes nonemployer firms (sole proprietors, gig work)

Nonemployer Statistics (NES) - U.S. Census Bureau

Economic data on businesses without paid employees, often sole proprietors

Based on IRS tax filings of individuals with business income

Captures a large but often overlooked segment of U.S. business activity

Nonemployer businesses only

Lacks detail on survival, employment, or growth

Economic data on U.S. businesses with paid employees

Draws from the Census Bureau’s Business Register, which uses administrative records (primarily IRS and Social Security data)

Employment counts are taken from the pay period including March 12; payroll and receipts reflect annual totals

Comprehensive coverage of U.S. employer firms

Highly detailed geographic and industry breakdowns

Consistent methodology allowing trend analysis over time

Excludes nonemployer firms (sole proprietors, gig work)

Provides “snapshot” counts but not business dynamics such as survival or firm age

Annual Business Survey (ABS) - U.S. Census Bureau

Business characteristics and ownership data for U.S. employer firms, with a strong focus on demographics of business owners (race, ethnicity, gender, veteran status)

Annual survey mailed to a stratified (by state, frame and industry) sample of employer businesses with at least one paid employee. Uses the Business Register as its frame. Sample size is about 230,000 firms per year, designed to produce nationally representative estimates

Only comprehensive source on business ownership demographics across the U.S. Allows analysis of disparities by owner characteristics; flexible design that adds rotating topical modules (e.g., R&D, innovation)

Covers only employer businesses (excludes nonemployers, which make up the majority of U.S. businesses); some topical content rotates, meaning not all variables are available every year.

Employment and wages for businesses covered by Unemployment Insurance (UI) laws

Comprehensive administrative data from ~95% of U.S. jobs

Very detailed at the county and industry level

Excludes informal businesses and firms not covered by UI

*See Appendix for a snapshot of the large, national datasets that exist and their different methodologies.


**This is done by linking two long-standing Census Bureau panels: the Longitudinal Business Database (LBD) (which tracks employer establishments over time) and the Integrated Longitudinal Business Database (ILBD) (which tracks nonemployer firms)


***The Longitudinal Business Database tracks all U.S. employer establishments over time by linking annual records to provide a history of each business, allowing for the analysis of changes in employment, payroll, and other business statistics.

bottom of page