Identifying idiosyncrasies in small firm data: Understanding what’s behind the data driving small firm policy
- Tracy Cole
- 34 minutes ago
- 5 min read
What is an “employer” firm?
Consider the following scenario:
A tutoring business opens in late 2022, staffed only by the owner.
In early 2023, after several months building a client list, she brings on a handful of college students to take on additional clients on a contract basis.
In April 2024, the owner hires her first full-time, W2 employee. This person departs in early 2025, and she begins looking for a replacement.
In May 2025, she hires another W2 employee.
In reality, this tutoring firm has had numerous workers over time, including two full time, W2 employees. But in all official datasets, the firm is classified as a non-employer business. How can this be?
The answer is simple: national statistics on the number of non-employer and employer firms rely on administrative payroll data at a single reference point each year – the March 12 pay period. This longstanding practice aligns with a number of federal datasets, allowing for internal comparability as well as consistency over time. In addition, pinning analysis in the middle of a month avoids some of the fluctuations at both the beginning and end of the month.
But, while these national datasets can be useful in shedding light on small firm employment in the United States, they don’t always capture the full picture. As we’ve just seen, this approach can obscure the true nature of small firm employment. The March-to-March analysis misses activity occurring after the March 12 pay period, meaning employees hired later in the year are not counted until the following year, and that those who depart after a short time (voluntarily or otherwise) may not be counted at all. In fact, Pew has found that 14% of “employer firms” actually had no employees as of the payroll period including March 12.
Just as importantly, reliance on payroll data means that contract workers (who are not typically included on payroll) are not reflected in measures of employment. There are benefits and drawbacks to this approach; excluding contract workers from employment estimates avoids overstating firm stability and capacity for long-term job building where roles are short-term or task-based. On the other hand, exclusion also risks understating the true scale of job creation at small firms that rely heavily on contract workers to meet demand, manage risk, or test growth before making permanent W2 hires. Some firms also rely solely on contract workers for their business model, regardless of size (for example, construction firms and accounting brokerages). Among small firms participating in the Small Firm Diaries, 1099 workers are as common as W2 workers.
Why March 12?
Much of the data we rely on today has been collected the same way for decades, even though technology has made it possible to improve data frequency and quality without much additional cost. Take for example the biweekly paycheck; it is still prevalent not because workers prefer it, but because it’s a durable artifact of a time when it was too costly to pay people more frequently.* Certainly it’s more plausible than ever to offer people different pay frequencies (and studies have shown workers generally prefer more frequent pay).
Some of this can be attributed to path dependence and inertia in public administration. Policies and practices often persist because they become easier to maintain than to redesign to suit changing needs and capabilities. As a result, systems are maintained not because they are the most efficient or effective, but because changing them may be costly, disruptive to the status quo, and in some cases, risky.
Why does this matter?
When trying to better understand small firms, some of the most important questions center on employment and jobs: How many jobs do small firms create? What is the quality of those jobs - particularly in terms of stability and security? Much of what we know about small firms comes from federal data, but like any data source, there are very real constraints that inhibit the insight provided by the available data.
This can be problematic when data doesn’t present an accurate picture. In an earlier post, we showed how the story of small firm survival and employment rates changes meaningfully depending on what data you look at. Many federal datasets, including that used by the Small Business Administration, on small firm employment and survival rates exclude transitions between non-employer business and employer businesses, leading to commonly cited statistics for job creation and survival that differ from those produced by more comprehensive datasets.
Yet, this data shapes policies, programs, and products. When it comes to employment data, the County Business Patterns (CBP) is a useful example of how path dependence in public administration can have real world effects. Produced by the U.S. Census Bureau, the CBP provides annual data on the number of establishments in a region, first quarter and annual payroll, and employment numbers based on Census Bureau data on March 12. This data is explicitly used by local governments to determine workforce sizes and industry trends, analyze local economic structures, measure program effectiveness, and plan budgets and future policy initiatives.
Similarly, to gain funding from the Economic Development Administration, applicants must develop a Comprehensive Economic Development Strategy compliant with applicable guidelines. These guidelines require a data-driven assessment of regional economic conditions, including employment estimates derived from federal data sources such as those produced by the U.S. Census Bureau. Reliance on this data can make some parts of local economies more visible than others and in practice, sectors with seasonal or highly variable workforce patterns may receive less emphasis in regional strategies, even when they provide a substantial share of jobs. This can have real consequences for workers: there may be fewer training pipelines in their region, less public investment in sectors where employment is inconsistent, or less attention on improving job quality for workers in jobs that don’t show up in the data.
Is there an ideal frequency for data collection?
Decisions about how employment data - or any data - is collected and analyzed involves several tradeoffs, which make it hard to know the “ideal” frequency in any given scenario. Data drawn from administrative sources like tax and payroll records offers clear advantages such as reducing recall bias and survey opt-outs, while allowing for consistency and comparability over time. But when employment is measured just once a year, much is lost (and that is a factor in why income volatility of low-income workers was hiding in plain sight for so long, for instance) . Employment volatility, seasonality, short-term work arrangements, and inconsistency in hours and pay are common features of how many people experience work, particularly in low-wage and small-firm settings.
*If you know of good sources on the transition from daily or weekly cash pay when it was the only option, to longer periods and non-cash payments, please let us know!