Skip to main content
  1. Posts/

Examining the Cherry Pick Ratio

·1319 words

In December 2024, I published an article proposing a new metric: the Cherry Pick Ratio.

What is the Cherry Pick Ratio? It’s a metric that fills a gap in the generative AI space. It measures the proportion of useful outputs generated by a model. Usefulness is measured by the user. So, the Cherry Pick Ratio, is a measure of the effort a user must put in to generate an output that they’ll actually use.

The article has led to a number of interesting conversations. Many of them are positive. I’ve even heard colleagues begin to use the term in their own work. Some conversations have led to questions that made me think deeply about the concept that I am proposing.

In this article, I want to share some of the most challenging questions that I heard—and my reflections on them.

Is the Cherry Pick Ratio a process or product metric? #

This question is about who the Cherry Pick ratio focuses it on. Is it a measure of the user experience of a generative model—the process they go through to create an output—or is it a measure of the experience of the audience for the outputs of the model—the people who see the product?

The Cherry Pick Ratio falls solidly on the process side. It is meant to measure the user’s number of attempts before they generate a product that they are willing to use.

I can certainly see how the product/output side is interesting. After all, what I’m willing to use is not necessarily something that my audience will like. However, the specific purpose of the Cherry Pick Ratio is to understand the process side. Perhaps this will lead to future metrics on the product side to help us understand audience behaviour toward AI-generated products.

How does the Cherry Pick Ratio measure different user group behaviours? #

Different user groups may use generative models in different ways. An example that was called to my attention is first-time users. When you use a new generative AI model for the first time, you may be more inclined to explore, play, and test. This could drastically increase your tolerance for bad cherries because your intention is to explore, not arrive at a solution as efficiently as possible.

On the other hand, an experienced user may expect a model to generate an output as efficiently (or more efficiently) than the last time they used it.

I want to specify that, while the Cherry Pick Ratio can tell us how efficiently a model arrives at a satisfactory output, it has a much broader application.

Ideally, a model can support a user journey from first-time user to loyal advocate. A first time user may have a much higher tolerance than an experienced user. And indeed, as many readers pointed out, each user’s tolerance would change over multiple interactions with a system.

In my previous article, I addressed the concept of understanding user tolerance. You can conduct user research to understand this tolerance for different audience segments, and define unique Cherry Pick Ratios for each to find out if your model falls within your audience’s tolerance.

The Cherry Pick Ratio is, simply, a metric. Like impressions, pageviews or bounce rate, it’s a measure that can be used to study and optimize your model. But, for different audiences, the same metric can mean vastly different things.

How quickly do users adapt their expectations of a model’s capabilities? #

This is a challenging question. To be frank, I’m not sure I have a clear answer on it. And I’m interested to keep exploring.

The basis of the question is this: at what point does the user begin to expect less of a model. And, when that happens, do they modify their inputs to fit their perceived capabilities of the model.

On one hand, I would argue that some users may reach the end of their tolerance for a model when they begin to believe the model is not capable of a satisfactory output. But other users will have a greater tolerance for experimentation and exploration (as I’ve addressed above).

People are adaptable. And generative AI is, in essence, a collaboration between the user and the model. That means that the intention of the user, their flexibility in what they consider a satisfactory outcome, and their willingness to work within their perceived capabilities of a model can all dramatically affect the Cherry Pick Ratio.

The Cherry Pick Ratio alone may not be able to help you understand exactly when or why your user adapts their behaviour.

However, it may be part of the puzzle. The Cherry Pick Ratio as a metric can still offer us interesting insights into the user’s interaction with a model. Perhaps paid subscribers expect a more efficient model than free users. Or, to continue the example above, perhaps first-time users with a higher Cherry Pick Ratio are less likely to use the model a second time—because they experimented with it at length and ultimately determined it was not capable of meeting their expectations.

When are users willing to tolerate a high Cherry Pick Ratio? #

I want to start by addressing an assumption I’ve heard a few times. To be frank, it’s an assumption that I had going into this concept as well: that the goal is to have a lower Cherry Pick Ratio. That is, we want the model to generate useful outputs more often.

There are situations where that is true. An accountant using AI to generate a spreadsheet may expect that their first prompt results in a satisfactory outcome—especially if they have paid for expensive software.

But there are applications where a higher Cherry Pick Ratio may not only be tolerated, but preferable. AI has a great potential not just for efficiency but for play, for collaboration, for artistic exploration. In any of those, your user may be more interested in how many different things they can create with a model—instead of racing toward the first satisfactory output.

One case study stands out to me in particular. That is, users integrating multiple models into their process in an effort to lower the Cherry Pick Ratio.

For example, if I want to use a generative model to create a video, it may help me to input cinematic language into my prompt. Without knowing how to write in cinematic terms (as Mike Gioia dove into in a recent post), I may prompt a text-based model to create a cinematic prompt for the video model.

It’s a fascinating example. And it’s useful behaviour to understand. A video model that requires cinematic direction may be losing users who don’t arrive at a creative way to reduce the model’s friction. With that insight, you might work to make it easier to prompt with plain language. Or, you may embrace marketing your model to a niche audience who is fluent in cinematic language.

The Cherry Pick Ratio alone won’t tell us everything about this behaviour. But again, we can gain a measure of insight from querying the analytic.

Final Thoughts #

Defining the Cherry Pick Ratio had an unintended consequence: the assumption that a lower Cherry Pick ratio was superior or even preferable.

But further conversation has revealed a much broader application. The Cherry Pick Ratio is simply a metric. And, like any metric, it has useful applications in a number of contexts. Combined with a strong understanding of your model, your users and the user experience, it can offer insight into your model.

Even more useful, it can reveal areas of exploration. If it takes a user seven attempts to generate a satisfactory outcome, that reveals six “failed” data points. Was the user generating multiple options with the same prompt? Integrating another model to completely modify their approach to prompting? Or, were they simply exploring, experimenting and playing?

The answers will help you to build on your successes—and, more importantly, to learn from your failures.

That’s the only way to grow.