Regularized Adjusted Plus Minus Me and You and Zoboomafoo
I built my first RAPM model, here is what I found out
At the end of the season, when the NHL changed a bunch of things in their API, I also fixed mine and I took the time to the regular season data in case I needed it over the summer. After spending weeks on prospects work, I did not think the day where I do something with my NHL data would arrive, but it actually arrived today. I woke up in the middle of the night over the weekend thinking about RAPM and the voices in my head told me to build that RAPM model. And this is what I did today.
A little bit of history before we go further
RAPM is the evolution of the APM stat that comes from Basketball. It was created by Jeff Sagarin and Wayne Winston about 20 years ago because they thought that raw +/- was not enough to predict how good would be a player. So the APM, was created to estimate how good a player would be when a coach plays him on court. With a lot of data, it was widely considered as the best metric for player evaluation. The RAPM uses the Bayesian method Ridge Regression (also called L2 regularization) to give even more predictive power when it comes to smaller sample sizes. It comes with many criticisms. The famous @yolo_pinyato came up with this chart that summarizes the weaknesses of famous metrics.
RAPM was meant to be predictive, but I am not sure it really is. A simple excercise to see visualize the predictiveness of RAPM is to look at how it varies for a player from a season to another.
It is a really rudimental expriment that comes with its load of “this does not tell the whole story”, but it is a really easy-to-understand way of showing the predictiveness of RAPM for different metrics. I took the RAPM data available on Evolving-Hockey for the past 7 seasons for this test. It turns out that despite the fact that APM (the ancestor of RAPM) was created to predict how good a player would be, it turns out that we can’t fully rely on it to assume how good a player will be next season. The real power of RAPM comes from RAPM C±/60 (Corsi - Shot attempts) that hold significant predictive power. Even if Corsi is being more and more ignored at the expense of more complex metrics such as expected goals, it is important to note that when it comes to predictions, you might get better results with the RAPM C±/60. RAPM does a great job at explaining what happened in a context relatively complex. I think it fair to say that it is the best metric available when it comes to player evaluation.
RAPM is a really important metric in hockey analysis. It serves as the egg and the milk for the cake metric that the WAR/GAR (Win Above Replacement / Goal Above Replacement) is.
So what?
I built my first RAPM model using as target variable C± (on-ice shot attempts differential). The results are actually interesting.
Analytics darling Auston Matthews comes on top of the fowards in 2024 at 5v5 rankings using my RAPM model. Matthews has played with really talented players this season, so there is a chance that the model has attributed him too much credit for the offense created when he was on the ice. I don’t think it happened, but I think it is a valid question to ask when we look at model outputs. HockeyViz does a great job at isolating coaching, teammates, and other external effects on players performances. I think it is important to keep in mind that all those factors impact the model and even our own interpretation of what we see when we watch hockey. Everyone can watch hockey, but it comes to dissect the information we captured during our viewing session, it is really when the fun starts. Hockey is so complex that we will probably never come to a concensus and models do a good job at keeping us away from any hope of concensus when we think we are close to achieve that impossible task. Another element we can notice is the presence of role players on the chart. We can just assume that they did really good when they were deployed. So they were just great role players last season. They are not better than Connor McDavid, don’t worry.
As for the defensemen, my model tends to agree with with the Norris voting. Quinn Hughes seem to be an excellent choice for the trophy awarded to the defenseman of the year. Again, few surprises, but as I explained earlier, it is explainable. I also think that Bouchard was probably much better offensively, but offense was credited to his teammates. It is something to keep in mind like I said.
When I was much more immature, I used charts to dunk on players, teams, and hockey people in general. Growing in the field of data science, I realized I was doing a disservice to the profession of data scientist and to the creators of models by acting so. So I am not going to post the worst players, simply because I don’t want people with bad intentions to use my work in a way that I did not meant to.
Few players who caught my attention
I was one of his biggest critics (to not say hater), but he had a strong season and really established himself as a good 3rd pair NHLer this season. He outgrew the Sherif role that was probably easy to settle into and became a player who could provide offense and defense when he was tasked to. He should deserve every penny of the contract he will get this summer.
Despite the model probably underrating his offense, we can happily see that Selke McDavid has arrived. I think that if the model correctly rates his offense, he is probably in the 100th percentile with Matthews in terms of impacts.
Connor Bedard would’ve broke the league in pieces if he came in 10 years earlier. Unfortunately, the league was ready for him and players like him can’t tear up entire defenses by themselves at the age of 18 without a good supporting cast. Good for him, the Blackhawks spent a lot of money this summer to improve the team.
The Capitals did not only get CapFriendly over the summer. They kickstarted the hockey summer with the blockbuster trade that sent their backup goalie to Los Angeles in exchage for the infamous Pierre-Luc Dubois. Dubois has changed teams mutliple times in his young career and after signing a massive contract with the Kings, he stuggled to perform up as well as you want a player you just signed to a $8M per year contract to do. In Washington, PLD wants to prove that his 2023-24 season was only fluke. I think he is overhated because most of the times, “overpaid” players are generally negative players, but unlike them, Dubois actually provides positive value even if it is not 8M$ value. He can prove the haters wrong on that completely reshaped Capitals team, and most importantly he can try to help Alex Ovechkin to catch Wayne Gretzky.
Finally, let’s talk about the biggest free agent of the summer : Jake Guentzel. The Tampa Bay Lightning made the biggest splash of the summer when they sacrificed their captain Steven Stamkos (and Mikhail Sergachev) to sign Jake Guentzel. Guentzel is a former underrated cheap Crosby-merchant (just like a lot of others) who learned so much from playing with Crosby that he outgrew his Crosby-merchant role to become an elite player himself. Now, he is one of the best offensive players in the league and he brings some defensive value. Was it worth it to throw the captain away to sign the big fish? Only time will tell, but so far, it looks like a home-run move that you must do if you if you have a chance to.
On that note, I hope you enjoyed the article. Feel free to ask me in private messages if you want the card for other players. Feel free to correct me if I made some mistakes. And enjoy the summer.