Most of What Operators Call "Testing" Is Actually Just Reacting

Most of What Operators Call “Testing” Is Actually Just Reacting

A gym tries something new. A different ad headline, a new lead source, a revised tour script, a shifted rep schedule. Two weeks later, the sales manager pulls the numbers and decides whether it worked. If joins are up, it worked. If joins are down, it didn’t. The change stays or it gets reversed, and everyone moves on to the next thing.

That’s not testing. That’s reacting. And most of the back-and-forth in gym sales strategy isn’t strategy at all, it’s operators reacting to noise and then reacting to their own reactions.

The difference matters because the behavior looks the same from the outside. Both involve trying something, watching the numbers, and making a call. The difference is in what got decided before the change was made. A real test has a pre-committed window and a pre-committed success metric. You decide before launch how long you’re going to run it and what number has to move by how much for it to count. Most gym “tests” have neither, which means the decision gets made against whatever the numbers happened to do in whatever window the manager happened to check.

Here’s what that looks like in practice. A gym tries a new lead source. Week one, 12 leads and 2 joins. Week two, 15 leads and 1 join. Sales manager looks at the numbers, decides the source isn’t working, cuts it. Six weeks later, a different channel has a similar rough stretch and gets kept because “it’s been working historically.” Neither decision was based on a standard. Both were based on what the numbers happened to show on the day someone looked.

The problem isn’t that operators make mistakes. The problem is that without a pre-committed window and metric, there’s no way to tell whether a decision was right or wrong even in hindsight. You can’t learn from a test you never actually ran. You can only learn from the next two weeks of noise, and then the next two, and then the next two, and that’s how gyms end up perpetually tweaking without ever actually improving.

The contrarian claim is this. Most of what operators call testing is reacting, and most of what they call learning is just building a story around noise. If you can’t say, before you launch, what window you’ll evaluate on and what number has to move, you’re not testing. You’re changing things and hoping.

The practical version is three steps, and it takes about ten minutes per change.

Before you launch the change, write down what you’re testing. Not “we’re trying a new headline,” but “we’re testing whether a benefit-led headline increases lead-to-tour booking rate versus the current feature-led headline.”

Write down the window. For most sales changes in a gym, 30 days is the floor. 45 to 60 is better. Anything shorter is almost always inside the noise band, and you’ll make the call based on which day of the week you checked. If the change is a lead-generation tactic, extend to 60 to 90, because the conversion tail is longer than one month.

Write down the success metric and the threshold. Not “we want better results,” but “lead-to-tour booking rate needs to move from 38% to at least 45% over the 45-day window for us to keep the change.” Pick the number before you see the data. If you pick it after, you’re not evaluating, you’re rationalizing.

Then run the thing, leave it alone, and evaluate at the end of the window against the number you wrote down. Not before. Not at week two when you get nervous. Not at week three when the owner asks how it’s going.

The hardest part of this isn’t the mechanics. The hardest part is sitting on your hands for 45 days while the early numbers look bad. Most operators can’t do it, which is why most operator-run tests get killed in week two and the gym ends up running the same three promos forever, reverting every new idea before it has a chance to fail or succeed cleanly.

One last thing. Pre-committed testing doesn’t mean you can’t stop a change early. If something is going so badly that it’s damaging the business, kill it. But “damaging the business” is a much higher bar than “the numbers look soft two weeks in.” If you can’t articulate concrete harm, it’s not bad enough to stop. It’s just early.

Always open to a conversation on this one. Curious how other operators handle the tension between patience and urgency, especially when the owner is watching the numbers too.