Expected Goals has become all consuming in football analytics. Every prediction whether it be at the team or player level seems to be based on shot based Expected Goal models. In general this makes sense, Expected Goals are a very intuitive concept that essentially capture chance quality. They have very real and useful applications.
The problem with Expected Goals is that they only tell us about shots, and over the course of a game shots actually make up very few of the many different types of actions and events. The most frequent of these actions are passes.
Though there are hundreds of passes every game I’ve decided to focus on a small subset of them for the purposes of a non-shot based Expected Goal model called PEP: Pass ExpG Projections.
I’ve focused on passes into the danger zone – essentially passes into the penalty area – and assigned these passes an ExpG value regardless of whether or not they turn into shots by combining two simple probabilities.
Firstly, there is the basic shot ExpG calculation itself. I’ve created a basic ExpG model using only the distance of the shot from goal and the angle of the shot to goal. Even in the more complex Expected Goal models these tend to be the two most significant factors. This Expected Goal calculation gives the first of the two probabilities that make up PEP:
P(Goal|Shot Location) = ExpG
This is nothing new. The extension of PEP is the inclusion of a second probability.
The second probability I’ve calculated is given any individual pass into the danger zone, what is the probability that the team will generate a shot within the next five seconds. I chose five seconds essentially to make sure the danger zone pass was still an important driver of the play – as the time between the pass and the shot gets larger the impact of the pass on the shot becomes smaller – but also to ensure that if the player didn’t shoot right away or there was some quick interim action the effect of the pass would still be captured by the model. Comparing different pass origin locations I calculated the second probability.
P(Shot in next 5 seconds|Danger Zone Pass Origin)
The Expected Goal value that a danger zone pass adds to an attack is essentially the probability that the pass is converted into a shot along with the probability that a shot from the location the pass was received is scored. This joint probability calculation gives us the Pass ExpG Projection.
PEP = P(Shot in next 5 seconds|Pass Origin) x P(Goal|Pass Received Location)
Note that while I used shot data to construct the ExpG model there are no shot data at all used in the PEP calculation the only data used are the locations where the pass originated and was received.
I applied this model using Premier League data from the 2014-15 season and mapped the following PEP results (the colouring scheme is smoothed over small zones).
The results look fairly intuitive, passes from just above the box into the box have the highest PEP values and passes from the halfspaces or areas further away from the penalty box have lower PEP values.
One interesting fact to note is that the average pass has a PEP of only 0.005, which seems quite low but starts to make sense when considering the 4,222 danger zone passes from the Premier League last season next to the only 975 goals, most of which did not come within 5 seconds of a danger zone pass.
So what is the value of a model like PEP? Firstly, it gives us an opportunity to distinguish between providers (usually midfielders) more effectively by analyzing the value of their danger zone passes entirely independent of the player they were passing the ball to. We already have Expected Assist models, which look at the ExpG value of shots that players take after receiving a pass from a player, but these rely on the player receiving the pass to take a shot. PEP completely removes the subsequent actions of the receiving player from the equation.
The second main focus is defensive. Tom Worville had a very interesting piece on evaluating defenders using Expected Goal Relief. Essentially, he looked at the Expected Goal value each defensive action was taking away from the opposing team. The only problem with this type of calculation is that Expected Goals calculate the value of a shot from any one place on the pitch, and so assuming that the value of having the ball at that location is equivalent to the value of shooting from that location has an inherent upward bias.
PEP is based on completed passes into the danger zone so if a player intercepts or prevents a pass into the danger zone we can figure out exactly what the PEP of that pass would be and assign that to the defender as Expected Goal Relief. This gives us a further way to assign a number to defensive actions taking into account the context of any particular action.
I started by pointing out that there are thousands of actions that take place on a football pitch within the course of a game and we’ve only previously had models that define ExpG values for a small subset of these actions, shots. The idea of PEP is to start working out and providing ExpG values for a new subset of actions, namely passes. This simple model only looks at the value of passes into the danger zone, but using the same (or similar) rationale we could work backwards to assigning PEP values to thousands and thousands more passes.
Shot-based ExpG models can only get us so far and it’s time to start looking at more actions and who better to look to than Pep?
I should point out that I am not the first to come up with the idea of a non-shots based Expected Goal model. Dan Altman outlined one in his OptaPro Forum presentation from earlier this year and Will TGM also looked at valuing possessions in a similar way.