In February, I wrote a piece for the Ringer called “David De Gea Shouldn’t Be This Good“. The piece is about the effect of the Spaniard on Manchester United this season, almost single-handedly dragging them into 2nd place.
Goalkeepers play an almost completely different sport to their team-mates, and yet their contribution is probably the easiest for analytics to isolate – rather than saying that they are easy, I mean that they are the least hard. Where the intended contribution of a central midfielder is extremely difficult to isolate for someone without any knowledge of a coach’s instructions, a goalkeeper’s role can largely be assumed tacitly: to stop goals.
Post-shot expected goal models (in this case, using Stratagem’s subjective shot data that ranks a shot based on its quality) allow us to pin this down reasonably effectively. Of course, there are deficiencies: our current event data does not codify how quickly the shooter has pulled the trigger while receiving the ball, or how fast the shot itself moves towards the goal and how it is curling while doing so.
There is some discussion in the piece about whether we can tie down the points impact of a goalkeeper properly. Given the numerous interaction effects between players on the pitch, I’m sceptical: had Manchester United’s defence played well this season, De Gea wouldn’t have been able to have the effect that he has. While we could simulate all possibilities and expose the average expected impact of De Gea, this estimate will potentially under-play how often goalkeeper impact occurs at the tail-end of possibility distributions.