Game State Wins - A Functional Relationship
I get anxious with statistics that depend on the game state, particularly in regard to baseball. A state stat is one whose value depends on the events that preceded it. Examples of state stats include whether you will draw an ace on the river, how much money you will make on your job, today's temperature, RBIs, and pitchers wins. Stateless, or unbiased, stats are either completely independent of the events that preceded it or filter out those events. Examples of these are the roll of the dice, a coin flip, the next digit after the 34th occurence of the digit 6 in the number Π, and OPS.
Much effort has been spent producing measures that isolate a hitter's productivity outside of the context of their teammate's production. No self respecting sabremetician uses RBIs or pitcher wins as a measure of player production. On the other hand, OPS is easy to compute, correlates well with team run production, makes physical sense why it's a good predictor, and is even used occasionally by some of the talk radio blockheads to compare players.
Then rolls along Wins Above Replacement (WAR), which is everything that OPS is not. Hard to understand, even harder to compute, difficult to say whether it tells if a player is any good, and is routinely used by blithering idiots to make a pointless point. It's like coming back to RBIs and pitching wins to measure production, but on mathematical steroids. And yet it is irresistable: it provides a better, more rational, way to compute what each batter contributes to the team's win. The question is out there: Who produced the most wins for their team? Somebody did, but who? So we jump through hoops to find out who it is.
Please note, that I am among those who subscribe to the belief that there are no clutch players, just players who get to play those very few games when all the cameras are on and announcers engage in hyperbolic gushing in order to keep you in your seat long enough to hear the next beer commercial. So maybe I am prejudiced against state stats. Or maybe state stats will provide sufficient evidence to either confirm or refute my prejudices.
In either case it's peculiar as to why I would engage in the arduous task of measuring this stuff, particularly as it's already been done so capably in Baseball Reference (BR). The simplest reason is I thought it made for interesting analysis and computing, stuff I like to fill my idle moments with. When I started I had no idea how BR computed WAR, and didn't want to know. I was afraid that the approach I took would exactly match that taken by BR, rendering this whole exercise moot. Nor am I claiming in any way that the approach I took is any way superior to the BR approach or that anyone should use my numbers over BR. In fact, I'm scared to death that there are faulty fundamental assumptions or computing errors in my results, which is possible given the amount of resources I could give to this project.
The point of this documentation is partly for public discussion, but also because if I don't write this down I will, myself, forget what I did. And I would like to continue this analysis over a couple years, so forgetting is a bad option.
If it helps, for the purposes of the remainder of this discussion, you may think of WAR and Game State Wins (GSW) as synonomous. If that thought offends you, strike it from your memory immediately.
So, what is a Game State Win, and how is computed? The GSW is the direct extension of the Inning State Run (ISR), which is simpler, has more available data, and permits little room for variation in analysis. The ISR has been used to analyze whether stolen bases & sacrifice bunts are effective strategies. So lets start with the ISR and build from there.
The ISR looks at every plate appearance of every game and asks: "Given this situation, what is the most likely number of runs the batting team will score this inning?" The word "situation" used in the previous sentance is what is referred to as an inning state. The Inning State is composed of the number of outs and which bases have runners on them. There are 24 different inning states that a batter and pitcher encounter each plate appearance: there are either 0, 1, or 2 outs and there are 8 different combinations of runners on first, second, or third.
Associated with each of the 24 inning states is the number of runs that the batting team expects to score in the inning given the current situation. For example, we expect to score more often when the situation is bases loaded with no one out than we do when there are two outs with no one on. Those are the extremes, and our expectations for all situations fall somP_{w}here in between those two states.
Since there are so few inning states (24) and so many plate appearances over the course of the year (~ 200,000), there is a lot of data for each state. In fact, if you consider what I call Inning State Zero, which is no one out, no one on, it is guaranteed to occur at the start of each and every half inning. So it occurs over 45,000 times each year. Even these least visited states occur at least 400 times during the year. So there is plenty of data for each state.
If you add up all the runs scored in the remainder of the inning from any particular state, and then divide by the number of times the state occur, you come up with the expected number of runs scored from that state. The 2013 data for each inning state is shown in Table 1.
code | occurs | runs | expRuns |
0ooo | 45582 | 21040 | 0.4616 |
0xoo | 10993 | 8978 | 0.8167 |
0oxo | 3357 | 3668 | 1.0926 |
0xxo | 2584 | 3580 | 1.3855 |
0oox | 476 | 625 | 1.313 |
0xox | 977 | 1761 | 1.8025 |
0oxx | 601 | 1200 | 1.9967 |
0xxx | 630 | 1367 | 2.1698 |
1ooo | 32860 | 7994 | 0.2433 |
1xoo | 13064 | 6482 | 0.4962 |
1oxo | 5647 | 3517 | 0.6228 |
1xxo | 4608 | 3869 | 0.8396 |
1oox | 1834 | 1682 | 0.9171 |
1xox | 2087 | 2327 | 1.115 |
1oxx | 1526 | 2087 | 1.3676 |
1xxx | 1619 | 2529 | 1.5621 |
2ooo | 26168 | 2425 | 0.0927 |
2xoo | 13378 | 2860 | 0.2138 |
2oxo | 7305 | 2227 | 0.3049 |
2xxo | 5750 | 2328 | 0.4049 |
2oox | 2948 | 1020 | 0.346 |
2xox | 2945 | 1433 | 0.4866 |
2oxx | 1883 | 1032 | 0.5481 |
2xxx | 2007 | 1442 | 0.7185 |
3ooo | 0 | 0 | 0 |
The notation I use for the code the inning state is IS=Nzzz, where N is the number of outs (0,1,2) and z is either o (base open) or x (runner on base) for first, second, and third base. So 1oxo means one out, runner on second.
The first important property of table 1 is that each state occurs a significant number of times. Sure, some situations occur far more frequently than the others, but all the points have data attached to them. Secondly, and just as importantly, is that the data trends in the right direction. That is, when we go from no one on, no one out to no one on, one out, the number of expected runs decreases. Or, conversely, when we go from no one out, no one on to no one out, runner on first the number of expected runs increases. This is crucial, because inning state runs depends on the difference in the number of expected runs between inning states.
During virtually all successive plate appearances, the inning state changes from one state to another. The only exception is to hit a home run with nobody on base. The number of Inning State Runs awarded to the batter during any plate appearance is equal to the number of runs that scored during the at-bat, plus the expected number of runs for the Inning State at the end of the plate appearance, minus the expected number of runs for the Inning State at the beginning of the plate appearance.
ISR for batters is an improvement of RBIs; while RBIs give full credit to the last guy in the run producing chain, ISR allocates the credit more evenly and rationally across all the participants. Additionally, ISR has value for pitchers as an improvement for ERA. If a starter begins the inning by walking the bases full, gets pulled and the reliever then strikes out the next three batters (I wonder if this has ever happened?) then the starter has no runs charged against his record and the reliever only gets credit for 1.0 innings worked without letting in a run. In reality, the starter should be charged with the 1.75 expected runs he put on base (2.2 for the bases loaded less the .45 runs every pitcher inherits when he starts the inning) when he walked off the mound, and reliever should be credited with saving the 2.2 expected runs he was faced with when he took the mound. ERA definitely has some bias for starters who are backed by a bullpen that can bail them out of trouble.
Speaking of situations that may have never happened, is there a case where a manager uses his best pitcher to get the team out of these highly levered jams? Closers almost always just pitch the ninth, starting with none out and going to the end of the game. In these situations, the closer can at best save only 0.45 runs. In the bases loaded, none out situation there are 2.2 runs on the table to be erased. Instead of using your best pitcher to erase these runs, we usually get a parade of lefty/righty switches, almost entirely built of players rated no higher than 17th on a 25 man roster, and frequently using a pitcher with replacement player caliber.
ISR analysis on stolen bases is well documented; look it up, or better, work it out. The ultimate bad strategy is to put a fast runner at the top of the lineup who has a lousy OBA and then when he does get on, steals often at a success rate that's not above breakeven.
Finally, ISR like RBI or pitcher wins, is not a context free statistic like OPS. By definition, teams that score a lot of runs will be full of players with high ISRs. It say nothing about what happens if a hitter is traded from a high scoring team to a low scoring team. That said, ISR is still an improvement on RBI, ERA, and pitcher wins.
Important points regarding ISR are that there are a limited number of InningStates, all of them occur a significant number of times, so the data collected for them is meaningful, and the analysis of looking at the number of runs saved in moving from one InningState to another is valid.
Game State Wins (GSW), or as popularly known as WAR (Wins above replacement) is a logical extension of the ISR. While ISR measures the contribution to scoring runs in an inning, GSW measures the contribution to a team win during a game. The situation under analysis here is the Game State, to which can be attached the likelihood that the team will win the game. The Game State is the combination of the following things: the half-inning, how many runs the team is winning by, and the InningState. So how many Game States are there? Technically, it's infinite, as there is no upper bound to a team lead, but for any practical purposes, it is bounded. There are 18 half-innings if we say that the ninth inning and all extra innings are equivalent. If we invoke a mercy rule and say once a team has a ten run lead it will certainly win the game then there are 21 team leads (the integers between -10 and + 10, inclusive). Finally we have 24 Inning States. So practically speaking there are 18 times 21 times 24 Game States. That makes over 9000 feasible Game States. Bounded, yes, but many, many more states than there are InningStates.
Attached to each Game State is the number of times it has occurred, and the number of times the home team has won from this state, producing the probablitity of winning from this Game State. For each plate appearance, the Game State Win can be computed from the difference in the probability of winning expected from the Game State that existed when the plate appearance started and the win probablity expected by the GameState when the play ended.
As you may suspect a number of the 9000 Game States are never visited. For example, I reckon that the GameState of top of the first, nobody out, bases loaded and the visitors already leading by 10 runs has never, ever happened. (Has it?) Some GameStates are guaranteed to happen often (the start of every game, for example), while others happen less often, others rarely, and some not at all. Worse, it may be possible that an unvisited GameState may have two "neighbors" that have been visited, causing "holes" in the GameState data "map". Even worse, because of the paucity of data for some GameStates, it is possible that the measured win probability for one Game State to be worse than a neighboring, but "worse" Game State. In other words, it's possible that using measured data that a player could earn positive GSW for striking out.
So while I would like to base Game State analysis strictly on historical data, there are issues with sparse data leading to inconsistent results, and sometimes there is no data to use at all.
I'm not entirely clear how Baseball Reference measures win probability from a Game State. I have seen references to the Pythagorean formula to get from runs to wins, and I seen references to a simulaton (very cool) carried out from a Game State. So I really don't know, but I don't think it's what I'm about to propose here. I was looking to establish a mathematical relationship, or function, that would algorithmically provide a value of the expected number of wins for each and every Game State. Sound ambitious if not impossible, but let's try.
There are three principles that such a function must satisfy:
- It must match results with the historical data for those GameStates visited the majority of the time. So it must be damn near equal the historical win probability (P_{w}) for the beginning of the game, a state guaranteed to be visited at least once every single game. It's correlation to the measured historical P_{w} for rarely visted states may not be as good.
- It must be a smoothly monotonically trending function with regard to Game State within each half inning. By that I mean, if you do good during a plate appearance, you should be rewarded with a positive number of GSW, and the opposite if you mess up. For all Game States, no exceptions.
- The function must have a value for all possible Game States, whether they have ever occurred or not. Also, the value of P_{w} for any Game State must be a real number between 0.0 and 1.0, exclusive. The value cannot be either 0.0 or 1.0 because that would imply certainty that the game is won or lost, which, we have been taught from birth, never happens until the final out is made.
Let's start by stating that the probability of the home team winning (P_{w}) is a function of the GameState.
That is: P_{w} = f(GS)
and that the GameState is the combination of inning (inn), whether the home team is at bat (λ), the home team lead (L_{h}) and the inning state (IS).
So: GS = g(inn, λ, L_{h}, IS)
remember that IS = h(outs, runnerOnFirst, runnerOnSecond, runnerOnThird).
The first simplification is that the top and bottom of innings will be analyzed separately. It will be as if there are two separate analyses, one for when the home team is up and one when the visitors are up. It will be left for the end of the analysis to see if the two sets of data make sense relative to one another.
The second simplification is to only consider the first play of each inning only (InningState = 0ooo). Again it is left for later on how to incorporate inning states other than 0ooo.
Now there are just to variables left to consider: L_{h}, the home team lead, and inn, the inning number.
The behavior of P_{w} with respect to L_{h} while holding inn constant is that P_{w} increases as L_{h} increases for any L_{h} and inn
The behavior of P_{w} with respect to inn holding L_{h} constant is that as inn increases the P_{w} increases for any positive L_{h}, and decreases for any negative L_{h}. That is as we get later in the game, it becomes more likely the home team will win if it is leading because there are fewer chances for the visitors to catch up. Similarly, if the home team is losing late in the game there are few changce for it to catch up than early in the game.
Two other constraints from principle 3: No matter how large L_{h} is, P_{w} approaches but is never equal to one. And no matter how negative L_{h} is, P_{w} approaches but is never equal to zero.
The function I chose to represent the behavior of P_{w} with respect to L_{h} is the cumulative normal distribution function (CNDF). The shape of this function is good because it looks like an "S" curve when plotted on a chart where the X axis is the home team lead, and the Y axis is the probability of a home team win. It is monotonically increasing, and never reaches either zero or one.
The CNDF is defined by two parameters, the mean of the distribution (μ), and its standard deviation (σ).
The two charts on the right show how the two parameters can be used to control the position and shape of the S curve. Increasing the value of μ causes the curve to shift from left to right, while increasing the value of σ flattens out the curve.
For symetric distributions such as this one, the mean equals the median. In this analysis, the median is the home team lead where the home team (and the visitors) has exactly a 50% chance of winning. Also, in this analysis, I will be using the term variance instead of standard deviation. Variance is actually the square of the standard deviation. As a word it is more concise and descriptive than standard deviation. The variance determines the steepness of the curve. A third measure, called the rms error, describes how well the data actually conforms to the curve.
The P_{w} measured from 2013 data for the start of the bottom of the fifth inning, for home team leads ranging from -10 to +10 is plotted on the chart on the right, along with CNDF curve where μ = -0.67678 and σ = 2.76799. The correlation between the measured data and CNDF curve is excellent, so it's looking good from a practical sense. It's worth asking: Does using the CNDF function make any theoretical sense, or does it just empirically fit the data in a nice way?
The CNDF is derived from its more common cousin, the normal distribution function, sometimes call the Gausian distribution, and often referred to as the bell shaped curve. The normal function shows the distribution of values a variable can take when it starts at a certain value and is then allowed to take a random walk around that value. So if we say that at any Game State if the home team has a certain lead, and if we say that the lead may randomly grow or shirk over successive innings then the normal distribution is good predictor as to whether the lead will shrink to zero before the end of the game.
The strategy now is to take all of the data for each inning holding λ=bottomOfInning & IS=0ooo and find the values for μ and σ for the CNDF that has the lowest rmsError for the plot of P_{w} vs. L_{h} for that inning.
Let's call P_{w0} = P_{w}[IS=0ooo,λ=homeUp] = CNDF(L_{h}, μ, σ)
where
μ = q(inn), and σ = r(inn)
Also, before showing the results it's worth asking: What do we expect from μ and σ? You might say we are headed towards forcing the data to fit my prejudices ... maybe true.
μ is the home team lead at which both the home team and visitors are equally likely to win. Let's state that with the score tied (L_{h}=0) and the home team coming up to bat, the home team has a better chance of winning because they will have an extra half inning in which to score more runs. So μ must be less than 0. This means even if the home team is losing by a small amount (like half a run) then it has an even chance of winning because it has more chances of scoring.
The slope of μ with respect to inning should behave such that later in the game the home team has an even greater advantage because the percentage of at bats relative to the visitor's is even greater late in the game. However the slope of μ has to be coordinated with the behavior of σ. Let's just say we hope that the slope if μ is simple and well behaved.
When the visitors come up to bat with the scored tied, each team has an equal number of opportunities to build a lead and win, so we would expect μ for the top of the inning to be closer to zero than when the home team comes to bat.
σ is the variance, or steepness, of the CNDF function. All variance > 0 by definition. Large values produce flatter CNDF, while smaller values produce steeper CNDF. A steep CNDF implies that for a given positive home team lead, the more certain of a home team victory. So we may expect that as we get later into a game the value of σ decreases with each inning as the opportunity to lose or come back from a lead becomes more limited.
We would expect this behavior of σ whether the home team or the visitors are at bat.
Charts 1a, 1b, and 1c show the μ, σ, and rms error as function of inning when the home team is up and IS=0ooo, 1ooo, & 2ooo.
If we look at charts 1a, b, & c from the third inning to the end of the game we would say we have very well behaved data. μ start out negative and increases monotonically towards the end of the game, approaching but never reaching zero. σ is positive and decreases monotonically approaching zero in the late innings. Pretty good. But what is going on in the early innings with μ, especially the first inning? Note also that the rms error for the first inning is substantially different from the other innings. I have also seen this behavior in 2012 data so it's not a seasonal aberation. I don't mind curvature in the function but a reversal in the sign of the slope is going to reek havoc on the three principles that the functional relationship must satisfy.
It's doubly strange in that the GameState of [L_{h}=0, inn=1, visitorsUp, IS=0ooo] at the beginning of each game is the most visited game state in the entire domain, guaranteed to be visited once every game. You would think this would provide a steady anchor to the data. But no. The accuracy of μ is doomed because large leads are more rare in the first inning than in later innings, and therefore the tails of the CNDF distribution in the first inning have sparse data, providing strange results.
The chart to the right shows the number of times each game state was visited. This chart shows the paucity of first inning data relative to other innings. Remember, these chart is for game states at the start of the inning only (IS = 0ooo) In order for the visitors to have any lead at the start of the top of the first inning, they must hit a home run, adding to their lead, clearing the bases, not adding any outs, effectively restarting the game. but this time with a visitors lead. Similar arguments can be made for the home team in the bottom of the first. Doesn't happen all that often.
Note that the distribution of occurences of game states for each half inning looks a lot like a normal distribution centered on a zero run lead. Lends some justification for using the cumulative normal distribution to fit win probability vs. home team lead.
Now comes that magic of curve fitting μ and σ to the inning number. This like curve fitting the curve fit for the CNDF for each inning. Hopefully we are not getting to removed from reality. And, in a move likely to draw cat calls from the peanut gallery I am going to drop the first inning data on the basis of a high rms error and its ugly behavior.
Charts 3a & 3b show the data for μ and σ vs. inning when the home team is up. Each set of data is fitted with a linear least squares line and a quadratic least squares line. Charts 4a & 4b do the same for when the away team is up.
It turns out in each case the quadratic line fits the data better than the linear line. The quadratic curve does a very good job in fitting the data in each case with an rms error on the order of 0.03 in each case.
The 2013 data suggests the following models for μ and σ:
home team up:
μ = ( .01238 * (inning)^{2}) - (.07524 * inning) - 0.60321
σ = (-.04345 * (inning)^{2}) + (.10179 * inning) + 3.34529
visiting team up:
μ = ( .00310 * (inning)^{2}) + (.00619 * inning) - 0.33893
σ = (-.04583 * (inning)^{2}) + (.14226 * inning) + 3.40714
and the P_{w} for the home team at the beginning of each inning (IS=0ooo) is:
P_{w0} = CNDF(L_{h}, μ(i), σ(i))
Before making the final correction for when the IS is not 0ooo, lets look at the behavior of P_{w0} over each half inning and each home team lead versus the actual data.
Chart 5 displays a set of seven curves that show the probability of a home team win over a range of home team leads and innings. All the data is for the start of each inning (IS = 0ooo). The curves marked with a "C" are the computed values, while the symbols marked with an "A" are the actual data points for 2013. The number on each curve indicates the home team lead. The curves all show a sawtooth pattern because the home team always has a better chance of winning in the bottom of the inning where it has an extra turn at bat. The curves start out somewhat bunched at the start of the game and then flare out to zero and one late in the game.
When I first charted this data I did the computed curves first, then later added the actual data. When I looked at the computed curves by themselves I thought there was too much flare in the curves towards the end of the game. I mean, top of the ninth, home team with a 2 run lead, ends up losing the game only about 2% of the time. As a Mets fan, I must have seen all of those 2% games. But when you overlay the actual data on the curves you see the flare is verified by the facts.
The curves do a remarkable job of fitting the actual data points, with the exception of the first inning. Can you find the actual point for a 3 run home team lead in the bottom of the first? The reason the first inning data fits so poorly is that these game states are rarely visited.
Remember that the analysis only considered the inning number and the home team lead. It held the IS constant at 0ooo and considered the data when the home team is up and when the visitors are up separately, leaving questions for consistency between the top and bottom of the inning until later. Well, later is now. Chart 5 has both the top and bottom of each inning intermingled chronologically on the X axis. So there is really two sets of data on Chart 5 intermingled with each other. The fact that the curves form a sawtooth for a constant L_{h} is a good indicator that the two sets of data are consistent with each other. We expect that the chances for a home team win to be better with scored tied in the bottom of the third than we do with the score tied in either the top of the third or the top of the fourth innings. Also we expect the chances for a home team win to be worse with scored tied in the bottom of the third than we do with a 1 run away team lead in the top of the third or the top of the fourth innings.
There's a theoretical value for the size of the sawtooth. It's based on the fact that the expected number of runs scored from ISR 0ooo is 0.46 and the expected number of runs scored when there are three outs is zero. Also, use the supposition that the game state when there are three outs in the top of the fourth inning is identical to the game state at the start of the bottom of the fourth. This indicates that the probability of a home team win from a tied score in the bottom of the fourth should be 0.42 the distance between the values for P_{w} for the tied score in the top of the fourth and the P_{w} for a one run home team lead in the top of the fourth. I tried using this fact to plot away team values off of the home team data set, but the correlation with the actual data was worse and the curve got really ugly in the late innings.
I got concerned when the saw tooth failed to hold for large leads in the late innings. These are areas where the curve is "flaring" towards values of 0 and 1. I tried some tricks to get the sawtooth to hold, but these where tricks that tear down the simplicity of the model, and again, actually fit the actual points worse than on Chart 5. Given the agreement of the computed and actual data in the late innings, I am willing to live with the flare wiping out the sawtooth late.
Chart 5 gives us a template for which we can evaluate the win probability from any game state. The data on chart 5 incorporates three of the four elements of a game state: the inning, top/bottom of the inning, and the home team lead. What's missing is the inning state, as all the data shown is for IS = 0ooo (the start of the inning). One approach might be to repeat this analysis for each of the 23 other inning states. However, the number of game states visited for inning states other than 0ooo drops of dramatically, and the analysis quickly starts looking like the first inning when IS=0ooo.
A way that does not violate the three principals guiding this analysis is interpolate the win probability on chart 5 from the inning state data shown on Table 1. Each inning state has an expected number of runs scored during the inning associated with it. For IS=0ooo the expected runs in 2013 are 0.46 runs. For game states with any other inning state find the differential number of expected runs from the 0.462 runs expected when IS=0ooo. Take those differential expected runs and add (or subtract) it from the actual home team lead. Use that number of runs to interpolate vertically between the curves shown on chart 5.
For example, consider a batter who comes up to lead off for the home team in the bottom of the fourth with his team up by one. The curves say that the win probability for this game state (inn=4, λ=home, L_{h}=+1, IS=0ooo) is 0.713 The batter singles causing the inning state to transition to IS=0xoo. The number of expected runs for IS=0xoo is 0.817, so the number of differential runs from 0ooo is 0.817 - 0.462 = 0.355 runs. So we will want to find a point on chart 5 during the bottom of the fourth inning where the home team lead is 1.355 runs. The expected number of wins when the lead is one was previously found to be 0.713, and the win probability when the lead is two is 0.813. Interpolating between these two points to find a win probability value when the lead is 1.355 produces 0.748 wins.
This technique allows for expected run valuations that are consistent with the three principles: it uses data from game states most often visited, it always rewards good plays and penalizes bad plays, and while it is harder to give up or overcome leads late in the game, the win probability never hits zero or one until the game is over.
So how much does this analysis produce different results from those published in Baseball Reference? Probably not a whole lot, though the obvious departure is at the start of the game. BR start every game with a 50-50 chance of each team winning. This makes sense if you are computing the rest of the game using a simulation, as the chances of things happening to each team are equal. In this analysis the chance of the home team winning before any steps to the plate is 53.8% More than any other piece of data in this analysis, this data point can be considered bedrock. It is the one game state guaranteed to be visited once every game, and it is by far the most visited of all game states.
Also, I dare say, but don't have solid data is that this analysis produces win probability closer to zero and one late in the game than the BR analysis does. In any case, in assigning games won to each player on each play during a game relies on the differential win probability between two game states and not the actual value of win probability at each game state. So it doesn't matter all that much if the home team wins at the start of the game is 0.500 or 0.538
So what's to be learned from any of this? Besides that the actual data is nasty, with holes in number of occurences and occasional reversal in the P_{w} trends, that curves that reasonably represent the actual data for the majority of occurences can be done. Chart 5 also shows the powerful effect of playing as the home team. The zero run lead curve is always above P_{w}=0.500 for all innings. This bias is carried on for other leads and other innings throughout.
Also interesting are the actual points (other than the first inning) that do not fall on the calculated lines. I'm looking specifically tied score, sixth and seventh innings. I would like to think this is the seemy soft underbelly of every game when the starter is out or tired and you haven't gotten to the back end of the bullpen. Makes a good case for deep bullpens, not just stellar closers. This is where field managers and general managers make their money, the most unpredictable part of the game.
Let's finish with a gratuitous pot shot at game state analysis. One reason that OPS was developed and is in such popular use today is because it is unbiased with regard to team context around the player. The reason that pitcher wins and RBIs are no longer held in high regard is because their values reflect as much their team support as they do individual accomplishment. OPS isolates the individual achievement, but it also correlates very well to the actual number of runs produced by the MLB universe.
No matter how you cut it game state wins is a biased stat, not as bad as RBIs, but similar to it. By definition, players from winning teams will, on aggregate, have more game state wins, than those on losing teams. So, really, game state wins don't really compare players across teams as much as they distribute credit (or blame) for success among members within the same team. For all of this, you can't say it isn't interesting to see who scores high and low for producing wins.
Oh, and one more pot shot, at WAR. Or, more specifically, the AR in WAR. I find the use of replacement player values to be arbitrary and misguided. I use replacement player analysis a lot, but come from a different perspective. Let's say that you want to produce a number that shows how many runs a batter produced over the course of a season. What's the lowest number any batter can produce? That's easy, it's zero. If they never get on the whole season, then they produce nothing, zero. A nice base. But what if we want to say how many runs a pitcher prevented. ERA does a pretty good job of saying how many runs they allowed, but how many did they prevent? Problem here is that there is no ceiling on runs allowed, the way that zero provides a floor for the number of runs a batter produces. Positive infinity is no good as a ceiling, that just makes everybody the same. So what to do? Establish a replacement pitcher value of runs allowed. Then when a pitcher allows fewer runs than that value, you can say he prevented that number of runs over a replacement pitcher.
I use the replacement number as the value at which 90% of all MLB plate appearances have a batter (or pitcher) who performs better than that value. I find separate replacement levels for each position and league. That says that 10% of all plate appearances use a player who is a replacement player. Note that more than 10% of all players are replacement players because replacement players never appear as often non-replacement players.
Having developed replacement player values by necessity for pitchers, the same can be done out of choice for batters. It's all good and gives us nice positive numbers for all players doing better than the replacment level. However, it's really not necessary when it comes to game state wins, the way that WAR does. The same way that for every game there is one winning team and one losing team, and that if you add up the won/lost number for every team it always works out to 0.500, the same is true for game state wins. On every play, the number of game state wins earned by the batter is offset exactly by the other team's pitcher. The sum total of all game state wins for all players on each and every play is zero. The average number of game state wins over a season for all players is zero. Some players have positive numbers, meaning they contributed more to wins than loses, while others are negative. Same way some teams are above .500, others below. Computing game state wins based on replacement player levels just gives all non-replacement players positive numbers. It allows you to say, for example, that of the Yankees 90 wins, 9 of them were produced (or wont be) by Robinson Cano. But it never allows you to say that player X actually cost his teams some games, and how many.
My big gripe with WAR, which uses replacement levels by choice and not necessity, is that it is the only stat commonly stated with reference to replacement players, while some other stats (like pitcher runs prevented) that have to use replacement levels by necessity and not choice have no popularity.
Top 10 Batters | |
---|---|
+------------------+------+------+-------+------+-------+---------+---------+ | name | team | hits | walks | ob | ops | gsw | adjGSW | +------------------+------+------+-------+------+-------+---------+---------+ | Shin-Soo Choo | CIN | 162 | 112 | 274 | 0.885 | 11.4854 | 15.2957 | | Chris Davis | BAL | 167 | 72 | 239 | 1.004 | 9.83751 | 10.3378 | | Miguel Cabrera | DET | 193 | 90 | 283 | 1.078 | 9.70031 | 11.1218 | | Paul Goldschmidt | ARI | 182 | 99 | 281 | 0.952 | 9.43188 | 10.3659 | | Freddie Freeman | ATL | 176 | 66 | 242 | 0.897 | 7.61333 | 8.44083 | | Matt Carpenter | STL | 199 | 72 | 271 | 0.873 | 7.35043 | 9.28193 | | Robinson Cano | NYY | 190 | 65 | 255 | 0.899 | 6.90242 | 9.15683 | | Josh Donaldson | OAK | 174 | 76 | 250 | 0.883 | 6.44557 | 7.90415 | | Brett Gardner | NYY | 147 | 52 | 199 | 0.759 | 6.39259 | 6.95362 | | Eric Hosmer | KCR | 188 | 51 | 239 | 0.801 | 6.13712 | 6.64266 | +------------------+------+------+-------+------+-------+---------+---------+ |
So who were the top win producers of 2013? The table to the right lists the top 10 batting Game State Win producers for 2013. The column marked gsw is the number of raw GSW, not adjusted for replacement players. The average raw GSW for all players is zero, because on every play the number of GSW gained or lost by the batter is offset by the pitcher. So raw GSW might be thought of games above .500 or games over the average player.
The column marked adjGSW is the GSW above what a replacement player for that position and league would achieve. Shin-Soo Choo gets an extra large bump in adj GSW because the replacement rate for NL center fielders is so low.
So the batting top 10 is littered with some usual suspects, the high OPS guys from winning teams, with lots of plate appearances. All are from winning teams (.500 for ARI). But Shin-Soo Choo at the top? And not just on a position and league adjusted replacement level? And not by a little bit? And for a lead off hitter? I did a manual check on his numbers, and they seem ligit. He's not the only lead-off hitter on the list. Also interesting to compare him with his beleaguered teammate Joey Votto, who got on base more than anyone in the majors in 2013, has a .926 OPS, and a very decent 5.45 GSW.
Top 10 Pitchers | |
---|---|
+-------------------+------+----------+------+------+---------+---------+ | name | team | position | outs | ERA | gsw | adjGSW | +-------------------+------+----------+------+------+---------+---------+ | Clayton Kershaw | LAD | SP | 708 | 1.83 | 3.95949 | 8.64498 | | Greg Holland | KCR | SRP | 201 | 1.21 | 2.6259 | 4.96792 | | Jose Fernandez | MIA | SP | 518 | 2.19 | 2.5566 | 6.07072 | | Brad Ziegler | ARI | LRP | 219 | 2.22 | 2.53384 | 5.37358 | | Santiago Casilla | SFG | LRP | 150 | 2.16 | 2.28578 | 4.27455 | | Matt Harvey | NYM | SP | 535 | 2.27 | 2.17415 | 5.73471 | | David Robertson | NYY | LRP | 199 | 2.04 | 2.14892 | 4.28636 | | Joe Nathan | TEX | SRP | 194 | 1.39 | 2.04826 | 4.34436 | | Francisco Liriano | PIT | SP | 483 | 3.02 | 2.0175 | 5.45938 | | David Carpenter | ATL | LRP | 197 | 1.78 | 1.96728 | 4.415 | +-------------------+------+----------+------+------+---------+---------+ |
Now on the pitching side. Kershaw is the king, and there's no surprise in that. There's a nice mix of starters, long relievers, and closers in the list. The presence of long relievers warms my heart. These guys don't rack up the number of outs that starters do, so to earn the GSWs that they do they need to come in the game with a lot on the line and bail out the situation. Glad to see that it happens.
It's actually pretty hard for a starter to pile up a lot of GSW. They need to pitch very effectively, and they need to pitch in tight games. Pitching with a 4 run lead does you no good here. Also, starters can't have many bad innings, because bad innings can wipe you out much faster than good innings accumulate.
It's a bit cruel, and a bit like watching a car crash, but it's part of the reason we do rankings, so here are the bottom 10 hitters and pitchers for 2013. Note the preponderance of starters in the bottom pitchers list, but it's the closer on the list that's interesting. How does Jim Johnson, who pitched through only 211 outs with a good ERA of 2.94, makes this list?
Bottom 10 Batters | Bottom 10 Pitchers |
---|---|
+-----------------+------+------+-------+------+-------+----------+---------+ | name | team | hits | walks | ob | ops | gsw | adjGSW | +-----------------+------+------+-------+------+-------+----------+---------+ | Jeff Mathis | MIA | 42 | 21 | 63 | 0.535 | -3.03042 |-1.35993 | | Tyler Flowers | CHW | 50 | 14 | 64 | 0.603 | -2.81309 |-1.48438 | | Rickie Weeks | MIL | 73 | 40 | 113 | 0.663 | -2.79431 |-1.71677 | | Pete Kozma | STL | 89 | 34 | 123 | 0.548 | -2.72873 |-1.13874 | | Alcides Escobar | KCR | 142 | 19 | 161 | 0.559 | -2.69887 | 0.00000 | | Roger Bernadina | PHI | 14 | 4 | 18 | 0.603 | -2.58402 |-1.24611 | | Placido Polanco | MIA | 98 | 23 | 121 | 0.617 | -2.50964 |-1.86405 | | Elliot Johnson | KCR | 29 | 8 | 37 | 0.458 | -2.43688 |-1.52651 | | Juan Lagares | NYM | 95 | 20 | 115 | 0.633 | -2.25839 |-0.00000 | | Luis Cruz | NYY | 10 | 1 | 11 | 0.424 | -2.14243 |-1.73472 | +-----------------+------+------+-------+------+-------+----------+---------+ |
+---------------+------+-----+------+------+----------+---------+ | name | team | pos | outs | ERA | gsw | adjGSW | +---------------+------+-----+------+------+----------+---------+ | Dylan Axelrod | CHW | SP | 385 | 5.68 | -4.74985 |-1.11770 | | Joe Blanton | LAA | SP | 398 | 6.04 | -4.70454 |-0.91743 | | Lucas Harrell | HOU | SP | 461 | 5.86 | -4.70428 |-0.32214 | | CC Sabathia | NYY | SP | 633 | 4.78 | -4.37933 | 1.24865 | | Edwin Jackson | CHC | SP | 526 | 4.98 | -4.2987 |-0.28920 | | Jim Johnson | BAL | SRP | 211 | 2.94 | -4.15808 |-1.31091 | | Ian Kennedy | SDP | SP | 172 | 4.24 | -4.09722 | 0.00000 | | Joe Saunders | SEA | SP | 549 | 5.26 | -3.91646 | 1.16608 | | Barry Zito | SFG | SP | 400 | 5.74 | -3.7277 |-0.59028 | | Scott Diamond | MIN | SP | 393 | 5.43 | -3.7102 |-0.14002 | +---------------+------+-----+------+------+----------+---------+ |