12 min read

Quantifying Differences between the Regular Season and Playoffs using Survival Analysis

Introduction

From a casual fan’s perspective, the intensity traditionally ramps up in the playoffs because teams are closer to the grand prize, the Stanley Cup. Fans are hyped up by the storylines and rivalries for every series, and so each event feels all the more momentous. So, how different are the rates of goals, shots, or hits from the regular season to the playoffs? Does the fact that a game is played during the playoffs change these rates significantly? Which rates don’t change that much?

Methods

The rates of events such as goals or shots follow a Poisson process, which counts the rate of events happening in a certain time period, such as a day or a week. One of the key assumptions of Poisson processes is that interarrival times, the time between events, follow an exponential distribution. I’ve checked this assumption for my data, which is included in the Appendix. Some real world examples of Poisson processes include number of car accidents in an area or the requests on a web server.

Since Poisson processes do not account for the event not occuring in the duration of a game (ex.A 2-0 game where the losing team doesn’t score), I pivoted my focus to survival analysis. According to this Cross Validated post, in survival analysis, the response variable is the time that has elapsed between events. Importantly, survival analysis deals with censoring-we might have incomplete information where the event of interest doesn’t occur in the duration of the game. I consider the treatment variable to be whether the game is played during the regular season or the playoffs. Also, I account for different situations in the game: 5-on-5 and power play (PP) situations. PP includes both 5 on 4 and 5 on 3 situations and accounts for roughly 25% of the time in my dataset. 5-on-5 situations make up 48% of the data.

Cox Proportional Hazards Model and Plot of Hazard Ratio

  • Cox Proportional Hazards Model is a popular regression technique in Survival Analysis, where the measure of interest is hazard ratios.
  • A hazard ratio greater than one indicates that as we move from the regular season to the playoffs, there is an increase in the event of interest.
  • A hazard ratio smaller than one indicates that as we move from the regular season to the playoffs, there is a decrease in the event of interest.
  • The article contains scatterplots of hazard ratios by event types that contain horizontal lines at a hazard ratio of 1, which signifies a constant rate between the regular season and playoffs.
  • If the confidence intervals for hazard ratios include 1.0, the difference between the regular season and the playoffs is not significant.

Survival Curves

  • The survival curve is another visualization that shows the difference in rates between regular season and playoffs, but the curve uses a log-rank test to see if the difference is significant.
  • I think of the relationship between the plots of hazard ratios and survival curves using a log-rank test as similar to that between confidence intervals (a.k.a hazard ratios) and hypothesis testing (a.k.a log-rank test). A one-to-one correspondence.
  • The graphs display two curves, one for playoffs and the other for the regular season.
  • In our graphs, the x-axis shows time during the game and the y-axis represents survival probability, which is the probability of our event not occuring.
  • For example, if our event is goals, the survival probability equals the probability of not scoring. As a result, the lower curve signifies higher rate of event.
  • If a curve decreases faster (curve is steeper), that means we are more likely to observe the event of interest in the early stages of a game.

Tools and Resources

My main tool for organizing the data is the tidyverse, an opinionated collection of R packages that share an underlying design philosophy, grammar, and data structures. For the survival analysis, I used additional R packages, including survival, survminer, ggfortify, and rms.

For additional research into the theory behind Survival Analysis, have a look at this blog post by Dustin Tran, research scientist at Google Brain. For a tutorial of Survival Analysis in R, click here for Alboukadel’s post. Alboukadel is the author of the survminer R package, which analyzes and draws patients’ survival curves.

Data

I will be using Play by Play (PBP) data from 2007-2008 to 2017-2018 NHL seasons provided by Emmanuel Perry, creator of Corsica Hockey. The dataset contains 56 variables and over 13 million rows. Noteworthy variables I’ll be using for my analysis include Session (P for Playoffs and R for Regular Season), Event Type(Goals, Hits, Blocked Shots, Penalties, etc), Game Strength State (5v5, 4v5, 5v4, etc), and Game Seconds (Time of event during game in seconds).

I wrangled the data into a format for survival analysis. Again, Session describes whether game was held during regular season(R) or playoffs(P). Event Type specifies whether event of interest (in this case, goals) happened (1). A value of 0 for Event Type refers to the end of the game. Time Difference is the survival time in minutes, or in this case the time intervals between goals.

Plot of Hazard Ratios

First, let’s look at the results for goals. The hazard ratio is 0.910, meaning that as we move from regular season to the playoffs, goals decrease by approximately 9 percent in 5-on-5 situations. This matches what prior work has shown: a Globe and Mail piece by James Mirtle reveals that the number of shots on goal, shot attempts and the amount of time spent on the power play are similar between the regular season and playoffs. However, the rate of goals drops because save percentage is higher in the playoffs.

This makes intuitive sense. In the playoffs, teams have better goaltending since backups no longer play games. Furthermore, teams that deeper into the playoffs tend to have higher quality goaltenders, who are likely to have a higher save percentage than regular season league average. In truth, my analysis reveals that this change in saving behavior makes goal scoring decrease by almost 10% in the playoffs.

The second result is the hazard ratio for hits, 1.319, signifying an increase of 32% in hits from regular season to playoffs in 5-on-5 situations. As the graph shows, the magnitude of the increase in hits greatly outweighs that of other events. Evidently, the playoffs show a “certain level of violence” stemming from referees calling less penalties in the playoffs, as described by Chris Peters in a CBS Sports article.

The hazard ratio for blocked shots is 1.077, illustrating that as we move from regular season to the playoffs, blocks increase by 7.7% in 5-on-5 situations. As mentioned before, postseason hockey is all about chasing the grand prize, the Stanley Cup. This Associated Press article reveals that players display dedication and sacrifice to attain the ultimate prize: “It’s something that there’s a high desperation level come playoffs and everybody’s doing it” says Ian Cole, who blocked 31 shots in nine games during the 2017 playoffs.”

Lastly, let’s observe coach’s challenges, a recent addition to the NHL rulebook. The hazard ratio is 0.916. In other words, coach’s challenge decreases by 8.5% in 5-on-5 situations from the regular season to playoffs. During postseason hockey, the coaches are more conservative since they don’t want to lose the opportunity to call a timeout and draw up a play in the last minutes of the game. Notably, the confidence interval for the hazard ratio of coach’s challenges is extremely wide, demonstrating that this change is not statistically significant. Thus, I would caution the reader from making a conclusion whenever we have this wide of a confidence interval for any events.

On the power play, the hazard ratio for goals is 1.022, meaning that as we move from regular season to the playoffs, goals change by a factor of 1.022, or increase by 2%. This result is validated by the fact that the top 16 teams in the league play postseason hockey, where only the best players elevate their game to compete for the Stanley Cup. The offensively gifted players, such as Sidney Crosby, Alexander Ovechkin, and Steven Stamkos, are lethal on the power play and convert chances at a higher rate than your average NHLer.

The distinction between the hazard ratio for goals at 5-on-5 (0.910) and on the power play (1.022) is worth discussing. At 5-on-5, I’d argue goalies essentially have five defenders protecting the net in different ways. Forwards are covering the points and fighting in puck battles along the boards, and defensemen are focused on guarding the area around the net by preventing forwards from screening the goalie and clearing rebounds. As a result, top notch goalies can perform better in the playoffs, leading to a decrease in the rate of goals. On the other hand, when defending a power play, goalies have one less player to help out. With the extra space gained from the man advantage, teams on the power play can more freely deploy their offensive weapons. In the playoffs, only the best remain and thus, we see a higher rate of goals.

Importantly, the change in rates of power play goals and shots between the regular season and playoffs is not statistically significant since the confidence intervals for both hazard ratios include 1.0. On the other hand, the change in rates of 5-on-5 goals and shots are statistically significant since the confidence intervals do not include hazard ratios of 1.0.

The hazard ratios of hits in 5v5 and PP situations are very similar (1.319 and 1.324). This reveals that the change in rate of hits doesn’t depend on game situations. Players will ramp up their game level intensity during the playoffs and play more violently in both 5-on-5 situations and on power plays. This creates for a playoff atmosphere that NHL fans all love and entails huge swings in momentum and an increase in injuries. After the playoffs, few players come out unscathed.

Let’s finally look at coach’s challenges. This event displays a wide confidence interval, similar to the confidence interval for 5-on-5 situations. Statistically, this can be interpreted as a consequence of small sample size or high standard deviation. This is due to two factors: the league only introduced coach’s challenges in 2015-2016 and they don’t occur regularly during games, certainly not as regularly as shots or hits. The former means that the Play by Play data consists mostly of seasons played before the league implemented this rule. The latter is partly due to a recent change in NHL Rule for Failed Offside Challenges: “the team that issued the challenge shall be assessed a minor penalty for delaying the game.” NHL commissioner Gary Bettman states: “We’re in effect trying to discourage using the coach’s challenge on offside unless you’re really 100 per cent certain that you’re going to win because it was a blown call. The coach’s challenge was really intended to focus on glaring errors. And by imposing a two-minute penalty if you’re wrong, it should limit the number of challenges to those instances where there’s a glaring error.”

5v5 Survival Curve

Legend

  • Red curve represents playoffs
  • Sky bluish curve represents the regular season.

Analysis

  • The first result is goals, where the log-rank test reveals the difference in curves is significant.
  • There are higher rates of goals in the regular season as its survival curve is lower than the playoff survival curve.
  • The graphs for hits and shots contain steeper survival curves, signaling that we are likely to observe a hit earlier in the game than goals, whose survival curves is more rounded.
  • This matches an apparent observation of hockey games; goals rarely, if ever, precede hits or shots.
  • Future work would be to analyze factors leading to difference in goal-scoring rates and shot rates. Shot angles, shot distance, or quality of goaltending are possible candidates.
  • The coach’s challenge survival curve during the playoffs looks significantly different.
  • The rate of coach’s challenge during playoffs is noticeably lower during later part of 2nd period and most of the 3rd period.

PP Survival Curve

Legend

  • Red curve represents playoffs
  • Sky bluish curve represents the regular season.

Analysis

  • On the power play, the goal survival curves look slightly different, but the log-rank test indicates that they are not statistically different.
  • This confirms the hazard ratio of goals, whose confidence interval includes 1.0.
  • Hits survival curves are the most different out of all the survival curves and demonstrate that the game level intensity jumps alot.
  • The blocked shot survival curves tell a similar story: There is higher attrition and risk of injuries in playoffs due to players putting their bodies on the line to win.
  • In terms of giveaways, the survival curves reveal that playoffs have higher rates of giveaways than regular season.
  • Fatigue and faster game tempo lead to more giveaways and exciting hockey for fans. Not so much for coaches.

Key Takeaways:

  • As we move from regular season to the playoffs, goals decrease by approximately 9 percent in 5-on-5 situations.
  • In 5-on-5 situations, there is an increase of 32% in hits and 7.7% in blocks from regular season to playoffs.
  • On the power play, goals increase by 2% from the regular season to the playoffs
  • Importantly, the change in rates of power play goals and shots between the regular season and playoffs is not statistically significant,
  • On the other hand, the change in rates of 5-on-5 goals and shots are statistically significant.
  • The change in rate of hits doesn’t depend on game situations.
  • The steepness of survival curves show that goals rarely, if ever, precede hits or shots
  • Power play survival curves for hits are the most different out of all the survival curves.

The author would like to thank Sam Ventura for the original idea for this article and feedback, Alex Novet for his feedback, and Corey Sznajder and Evan Oppenheimer for reading over the initial version of this article.

Code is available on my GitHub

Appendix