*Elliott Stapley takes a look at the issue of variability across players’ statistical profiles season on season.*

The analysis in this article began with what I thought was a relatively simple question: how can Crystal Palace go about replacing Conor Gallagher? In the end, I never really answered this question. But I did arrive at a more fundamental question about the inherent variability in players’ statistical output season on season. Regardless, let’s start off thinking about Gallagher to see how I ended up where I did.

It’s important to clarify that Conor Gallagher is a bit of a weird player. He runs a lot. He tackles a lot. He shoots a lot. He assists a lot. But he doesn’t touch the ball much. There’s even a whole article on the topic.

But it’s 2022 and football data reigns supreme. There’s never been more access to player data and there’s certainly never been more analysis of that data, ranging from individual metric level (x player is y-th in the league for z metric), to scattergraphs (n players are i-th and j-th in the league for x and y metric), to pizza charts (x player sits in the y-th percentile for n different metrics). We have an enormous number of ways of classifying players per performance. You’d think we could be relatively sure what sort of player anyone is, regardless of how weird their profile might be, right?

Tasked with replacing a player, the logical first step would be to do some similarity analysis, where we try to locate other players who do the things we care about at a comparable rate to that player. This should provide a shortlist of players who are good immediate replacements for the output of the player you are looking to replace.

Before we consider doing that though, there’s a potentially significant issue with this approach in this specific case: Conor Gallagher this year and Conor Gallagher last year are two very different footballers.

Here’s some viz to explain the issue better:

In the last two seasons, Gallagher’s gone from more of a combative, defensive holding presence to an all-action final ball demon. Two profiles. One footballer. What gives?

The issue lies in our selection of—and faith in—event data to bin players. A player realistically can be most fundamentally described as the sum of their technical, tactical, physical and mental capacities. Aggregate event data and stats profiles show us what this footballer has been asked to do, not what they actually are.

Here’s a nice example of the similarity analysis mentioned earlier. Using FBref/StatsBomb’s player similarity tool for Gallagher produces his mirror player/substitute at Palace, Jeffrey Schlupp. I don’t think this is an indication that Palace have a ready made Gallagher replacement. Schlupp used to be a left back. Before that, he was a winger. Gallagher used to be a 6. Before that, he was a 10. They’re not similar players. They’re just doing similar things and so they have similar stats profiles.

With that in mind, our question now becomes what and who we are fundamentally trying to replace, and whether using a statistical output comparison is a valid approach. We know that player output can vary heavily year on year due to many factors (form, system, role, transfers), so let’s go about quantifying it.

**Methodology**

Framing the issue, we want to know how much commonly used metrics vary on a per-player basis, year on year. This should give us an indication of the relative stability of these metrics overall and the consistency of each player’s performance in these metrics. In effect, we are quantifying how much a stats profile can reasonably change on a seasonal basis.

To go about doing so, we’re going to use variability, which is defined as:

The extent to which the data set diverges from the average value. Or more simply, the standard deviation (normalised square distance from the mean) of the data, normalised by the mean.

The first step of the analysis was to find suitable players to check historically: using all available Premier League data on FBref (five seasons), I aggregated all of the players who have played more than ten 90s in at least three PL seasons. I’ve then binned them positionally, as you would obviously expect the stability of something like goals per 90 to differ between forwards and defenders.

Here’s an example calculation of variability of a single metric on one player’s history—in this case, Salah’s goals per 90:

We get a mean of 0.73, standard deviation of 0.17, and thus, a variability of 0.23, meaning that the standard deviation of Salah’s goalscoring on a seasonal basis is around a quarter of the mean.

This was then calculated across all metrics for all of the multi-season players, discounting the variability of players who had not attempted the metric at the rate of at least 10% of the mean of the data, and using per 90 values.

**Metric Variability Analysis**

Below is a histogram showing the summary information for how much a particular metric varied seasonally for all of the multi-season players:

Taking the mean of each metric’s variability allows us to start to draw some conclusions:

This is the least pretty but most significant conclusion from this analysis. We can clearly see the relative instability of most commonly used performance metrics, i.e. the extent to which past performance relates to current performance.

As you’d expect, we’ve got the usual “well, they’re quite random” candidates at the top: goals, assists, and shot conversion are all very unstable metrics. There is different metric variability for different positions (e.g. goals are more unstable for defenders, clearances more stable).

This analysis is a nice indicator of the value of using underlying metrics. They’re clearly more stable metrics according to this methodology: NP xG has half the variability that NP goals does. This is why we use xG; there are nine times more shots than goals. It happens more and so is a more stable performance indicator than goals. It’s also why contribution models are very valuable. In effect, if someone does something a lot, we can know better whether they are actually good at it.

As you might expect, then, the stable metrics are all high volume or rate stats like touches, passes, pass accuracy. Low event rate metrics have higher variability. This seems logical—if something doesn’t happen often, then over a repeated sample of a given length, there is more chance that the event frequency will relatively vary more.

This effect is quantified below:

This is a significant validation of the methodology and matches the expectation that high rate events will have lower relative variability. Low rate events are clearly unstable and relatively poor predictors of future or descriptors of past seasonal performance. Does this mean we should ignore them? No. But we should be aware that they are high variance and perhaps weight them less significantly in our player analysis/comparisons.

**Player Variability Analysis**

To show how this information could prove useful in player recruitment, we’re back on the pizza charts. By comparing player level variability in each metric to the overall variability (percentile rank from positional peer variability), you can generate a measure of consistency.

If a player has an extremely tight grouping for a metric over multiple seasons, then this will indicate a lower variability than average, and thus produce a high consistency ranking. Having this consistency information is potentially valuable from a recruitment perspective, as it will dictate your confidence that the players average level in that metric is either a good or poor representation of their overall level.

This is all a bit vague, so here’s a practical example. Let’s compare two Manchester City midfielders who should represent the use case: Ilkay Gundogan, who has had a heavily variable role, and Rodri, who has largely maintained stability in his role.

Here’s a pizza chart showing Rodri’s percentile rankings alongside his consistency across the last few seasons:

As you would expect, Rodri is both a very good and very consistent player. He’s played in the same role for the same coach in the same team for the entire sample. Year on year, his statistical profile should remain largely similar and it has. There is some variation in his defensive metrics, as he has transitioned into a deeper, more controlling role.

Now, let’s look at how Ilkay Gundogan performs on the same visualisation:

Gundogan has some wild variation in the majority of his metrics. He is extremely good at several things but shows extremely poor consistency over those metrics, e.g. open play goals. He has proven on average to be an excellent goalscorer from open play but the overall numbers disguise huge inconsistency.

The same is true of his other offensive metrics, due to the fact that he has played multiple distinct roles for Guardiola in his time at the club. If you’d just interpreted his stats profile by checking the percentile rank for open play goals per 90, you’d have missed this information, highlighting the potentially systemic, form-biased or random nature of his offensive output.

To illustrate the final bit more you can take from this analysis—I’ve averaged the consistency rank over all the metrics shown in the pizza chart for all the players in the dataset to create an overall consistency ranking. I’ll state explicitly: consistent does not necessarily equal good. Much inconsistency is generated by transfers or positional change and not necessarily player level inconsistency in form. It goes without saying, Gundogan is one of the best central midfielders in the world and yet he has shown tremendous variation in his statistical profile.

Here are the top and bottom five of players from the PL compared per consistency:

The players at the top of the list have been at one club or two very comparable clubs, played in the same role, and have been largely consistent in their performance levels. The bottom of the list includes players who’ve made transfers between clubs, played in several different roles, under several different coaches or just flat out been inconsistent.

**Conclusion**

I hope that the value and intention of this article is clear: a player’s seasonal stats profile is not a complete descriptor of a player nor does it define their past or future performances. Players’ performance levels change wildly seasonally in many metrics due to inherent randomness in the distribution of the event the metrics describes but also due to form, system changes, role changes, and transfers. This is why comparing snapshots of seasonal performances of players across different leagues is potentially problematic and why Palace won’t just let Jeffrey Schlupp be their Gallagher replacement (I hope).

Consideration of relative consistency in the metrics you care about when profiling players is an important stage in your analysis. To repeat: high or low consistency doesn’t necessarily mean good. If you do something badly but very consistently, that is not certain to be better than someone doing something well but very inconsistently. Similarly, relative consistency levels are not an indictment of individuals but perhaps an outcome of the situation in which they’ve been asked to perform, or due to an inherently unstable metric.

In short, uncertainty analysis matters and statistical profiles are not complete snapshots of players.

**Suggestions for Future Work**

- Position-specific analysis
- Better averaging of the variability (maybe include attempt rate in the weighting)
- Bootstrapping data for players with small number of seasons played to actually use Gallagher in the analysis
- Team style analysis using variability
- Effect of managers on player and team level metric variability