A way to measure fielding

Seven factor FIP inning state runs (7fFIPisr) measures the number of runs saved or allowed due to fielding performance. The data used to compute it is publically available from Retrosheet play by play game logs.

Let

     defense = pitching + fielding.              (1)

so

     fielding = defense – pitching.              (1a)

The total defense is easily measurable as the total number of runs per game the team yields. If we can say precisely how many runs are allowed strictly by the pitching, then we will have a good handle on the impact of fielding.

If ERA measures the pitching performance, then using postulate one (fielding = defense – pitching) and labeling the number of runs per game given up by the defense as the actual run average, or ARA, then the unearned runs average, or UERA is

     UERA = ARA – ERA                            (2)

The major flaw of this measure is that it only considers errors by the fielders and does not consider those extraordinary plays by fielders that save runs. Only half of the fielding effects are accounted. *

The work of Voros McCracken, who introduced defense-independent pitching, stated that pitchers have full control of the outcome of plays only when the event is a strikeout, walk (or HBP), or home run. Otherwise, the pitcher has only reduced control over outcome. This work was extended by FIP, fielding independent pitching, which is a measure based on the DIPS factors regressed and normalized on ERA. FIP values are published for each pitcher on Baseball Reference. FIP was in turn extended by xFIP, which stated that home run rates for each pitcher were unstable over time, that every fly ball was a potential home run, and so replaced the home run rate with the fly ball rate. The equation Baseball Reference used to determine FIP values is

     FIP = [(13 HR + 3 (BB+HBP) – 2 K) / IP] * C (3)

where C normalizes the FIP to the league ERA.

The "everything" that's not a K, BB, or HR is referred to as a "ball in play" and the rate at which balls in play fall in for hits is BABIP, batting average for balls in play. Roughly,

     BABIP = (H – HR) / (AB – HR – K)            (4)

What's most striking about each of these measures is the commonality in their name; each states that each is independent of fielding. It stands to reason, then, that everything that these measures does not account for must be somewhat dependent on fielding. I'm going to call the excess runs allowed by the defense when the pitcher is on the mound as runs over FIP, abbreviated roFIP. It's equation is

     roFIP = ARA – FIP                           (5)
Figure 1 - Team fielding v. BABIP
Click on above link for expanded chart for league

Now just because FIP is independent of fielding does not necessarily imply that roFIP is independent of pitching. In fact, it is most likely to be combination pitching and fielding. The question is how much of each? That is, do some pitchers produce consistently weaker bat contact, and therefore easier fielding chances, than others?

Using BABIP as a proxy of fielding performance, Figure 1 shows the team fielding performance as measured by UERA & roFIP as a function of BABIP.

Positive values on the y axis indicate runs allowed, while negative values show runs saved due to fielding. So if high BABIP is an indication of poor fielding, then we expect the lines to have a positive slope.

The unearned run average line (in blue on Fig 1) is practically flat. This means that whether you are a good fielding team or a bad one, you give up 0.30-0.40 unearned runs per game. This implies that errors are a poor way to measure fielding, and, therefore & significantly, ERA is a poor way to express pure pitching performance.

The roFIP line (in orange in Fig 1) has a significant positive slope. The range of the regression line for roFIP has a minimum of -0.1 runs allowed for the best fielding teams (BABIP=.245) to a maximum of +0.75 runs allowed for the worst fielding team (BABIP=.277). This works out to a range of -16 to +120 runs allowed over the course of a season. This is the maximum possible range (or greatest slope) of fielding performance variation. If pitching is at all attributable to roFIP then the variance in the range of runs allowed (or the slope of the regression line) would still positive, but not as large as the slope in Figure 1.

Also annoying but explainable is that, on average, teams allow a positive number of fielding runs. One might expect the average fielding team to give up zero fielding runs, good teams to allow a negative number of fielding runs, and poor fielding teams to allow a positive number of runs. The positive bias in roFIP is explained that by definition FIP is normalized to produce a value whose average equals the league ERA. The roFIP line cuts the UERA line right at the midpoint of both lines. If you were to subtract UERA from roFIP you would get a line centered around zero runs allowed, but still with a significant slope.

Let's look at two popular modern fielding measures to compare how they attribute pitching to roFIP . These measures have been published in Baseball Reference: Total Zone Fielding Runs (TZFR) from BaseballProjection.com and Defensive Runs Saved (DRS) from Baseball Info Solutions. Both look to measure the expectation of a plays outcome based on how and where the ball was hit; Total Zone appears to use information available from the play-by-play logs from Retrosheet while Defensive Runs Saved uses game video to more exactly characterize the nature of the batted ball. While I appreciate the precision that DRS provides, I like the fact that Total Zone uses widely public information to produce its results even more.

The regression lines for TZFR & DRS fall practically on top of each other. This is despite the fact that they correlate poorly against each other. This suggests that both measures perform equally well in measuring fielding, but that they do it very differently.

The TZFR & DRS lines in Figure 1 both have positive slope, but the numerical value of their slope is only one half the value of the roFIP slope.

Based on the regression lines, the fielding difference between the best and worst fielding teams using TZFR & DRS is about 0.4 runs per game in the NL, and about 0.6 runs per game in the AL. Over a season this works out to 70 – 90 runs per season. Compare this to the usual 550 - 750 total runs per season.

The notion of FIP is adjusted to account for this slope difference. The first adjustment is to normalize the new FIP to actual number of runs scored instead of just earned runs. This FIP is now not as useful to see whether a pitchers ERA will trend back to its FIP, but its residuals will be more useful in measuring fielding.

The second adjustment is inspired by xFIP. XFIP said that pitchers home run rates were not stable in time, but fly ball rates were, and its just a matter of some luck as to whether a fly ball becomes a home run or not. Let's look at it the from the other side. The question is how much of BABIP is due to good or bad fielding, and how much is due to the pitcher allowing easier balls to field?

Figure 2 - Play Types - 2014

To measure this, take the opposite approach to xFIP and separate home runs from fly balls. Then, go even further, and separate out ground balls, line drives, and pop outs. These event types are important because the play-by-play logs in Retrosheet differentiate between batted balls in this way. Also, the probability of a hit varies tremendously between these event types. Figure 2 shows frequency each event type occurs, and whether the event leads to an out or the batter on base.

A popup is very different from a fly ball in that it is almost certainly a guaranteed out. A line drive is the anti-popup in that its likelihood of falling in for a hit is far greater than that of a fly ball. A home run goes even further as an anti-popup, as it is certain not to be an out.

So now we have seven factors to measure pitching performance against, which are in order of their contribution to the regression:

  1. Strikeouts
  2. Walks
  3. Home Runs (now we are at FIP)
  4. Line drives
  5. Fly balls
  6. Ground balls
  7. Pop-ups

The regression equation for this seven factor FIP (abbreviated as 7fFIP) is 1

     7fFIP = 2.354 – (13.74 K) + (15.40 BB) + (37.25 HR) + (7.30 LD) + (4.72 FB) + (1.77 GB) – (5.42 PU)     (6)

And the MLB leaders in 7fFIP for 2014, for pitchers with a minimum of 100 plate appearances is:

| name            | team | plays | K    | BB   | HR   | FB   | GB   | LD   | PU   | 7fFIP  | era  |
+-----------------+------+-------+------+------+------+------+------+------+------+--------+------+
| Aroldis Chapman | CIN  |   202 |  106 |   26 |    1 |   13 |   31 |   20 |    5 | -1.53  | 2.00 |
| Dellin Betances | NYY  |   341 |  136 |   26 |    4 |   33 |   87 |   35 |   20 | -0.18  | 1.40 |
| Ken Giles       | PHI  |   166 |   64 |   11 |    1 |   17 |   42 |   21 |   10 | -0.17  | 1.18 |
| Wade Davis      | KCR  |   279 |  109 |   26 |    0 |   24 |   71 |   33 |   16 | -0.17  | 1.00 |
| Andrew Miller   | BAL  |   242 |  103 |   22 |    3 |   26 |   58 |   23 |    7 | -0.16  | 1.35 |
| Sean Doolittle  | OAK  |   236 |   89 |    8 |    5 |   43 |   33 |   33 |   25 | 0.04   | 2.73 |
| Greg Holland    | KCR  |   240 |   90 |   20 |    3 |   25 |   64 |   26 |   12 | 0.43   | 1.44 |
| Brad Boxberger  | TBR  |   247 |  104 |   24 |    9 |   28 |   53 |   18 |   11 | 0.63   | 2.37 |
| Craig Kimbrel   | ATL  |   243 |   95 |   28 |    2 |   26 |   52 |   31 |    9 | 0.68   | 1.61 |
| Kenley Jansen   | LAD  |   268 |  101 |   19 |    5 |   33 |   54 |   43 |   13 | 0.81   | 2.76 |
| Jake McGee      | TBR  |   274 |   90 |   18 |    2 |   50 |   63 |   31 |   20 | 0.82   | 1.89 |
| Clayton Kershaw | LAD  |   749 |  239 |   33 |    9 |   66 |  251 |  107 |   44 | 0.83   | 1.77 |
| David Robertson | NYY  |   259 |   96 |   24 |    7 |   24 |   65 |   34 |    9 | 1.35   | 3.08 |
| Pat Neshek      | STL  |   255 |   68 |   11 |    4 |   46 |   65 |   31 |   30 | 1.49   | 1.87 |
| Zach Duke       | MIL  |   237 |   74 |   17 |    3 |   17 |   87 |   33 |    6 | 1.51   | 2.45 |
| Joaquin Benoit  | SDP  |   205 |   64 |   15 |    3 |   39 |   48 |   24 |   12 | 1.59   | 1.49 |
| Joakim Soria    | DET  |   182 |   48 |    8 |    2 |   26 |   57 |   27 |   14 | 1.71   | 4.91 |
| Chris Sale      | CHW  |   685 |  208 |   50 |   13 |   98 |  182 |   88 |   46 | 1.73   | 2.17 |
| Cody Allen      | CLE  |   279 |   91 |   27 |    7 |   45 |   64 |   26 |   19 | 1.78   | 2.07 |
| Josh Fields     | HOU  |   231 |   70 |   19 |    2 |   40 |   47 |   38 |   15 | 1.81   | 4.45 |

and for for pitchers with a minimum of 500 plate appearances,

| name              | team | plays | K    | BB   | HR   | FB   | GB   | LD   | PU   | 7fFIP  | era  |
+-------------------+------+-------+------+------+------+------+------+------+------+--------+------+
| Clayton Kershaw   | LAD  |   749 |  239 |   33 |    9 |   66 |  251 |  107 |   44 | 0.83   | 1.77 |
| Chris Sale        | CHW  |   685 |  208 |   50 |   13 |   98 |  182 |   88 |   46 | 1.73   | 2.17 |
| Corey Kluber      | CLE  |   951 |  269 |   57 |   14 |  115 |  310 |  146 |   40 | 1.98   | 2.44 |
| Jake Arrieta      | CHC  |   614 |  167 |   44 |    5 |   59 |  207 |  106 |   26 | 2.10   | 2.53 |
| Felix Hernandez   | SEA  |   912 |  248 |   51 |   16 |   93 |  351 |  128 |   25 | 2.17   | 2.14 |
| Carlos Carrasco   | CLE  |   529 |  140 |   32 |    7 |   64 |  193 |   80 |   13 | 2.33   | 2.55 |
| David Price       | DET  |  1009 |  271 |   43 |   25 |  167 |  300 |  151 |   52 | 2.36   | 3.59 |
| Stephen Strasburg | WSN  |   868 |  242 |   48 |   23 |  106 |  278 |  140 |   31 | 2.49   | 3.14 |
| Max Scherzer      | DET  |   904 |  252 |   69 |   18 |  170 |  223 |  125 |   47 | 2.49   | 3.15 |
| Jon Lester        | OAK  |   885 |  220 |   53 |   16 |  133 |  273 |  132 |   58 | 2.52   | 2.35 |
| Phil Hughes       | MIN  |   855 |  186 |   21 |   16 |  156 |  243 |  155 |   78 | 2.63   | 3.52 |
| Zack Greinke      | LAD  |   821 |  207 |   45 |   19 |   77 |  283 |  149 |   41 | 2.70   | 2.71 |
| Jacob deGrom      | NYM  |   565 |  144 |   44 |    7 |   75 |  178 |   96 |   21 | 2.74   | 2.69 |
| Masahiro Tanaka   | NYY  |   542 |  141 |   27 |   15 |   70 |  180 |   88 |   21 | 2.75   | 2.77 |
| Jordan Zimmermann | WSN  |   800 |  182 |   35 |   13 |   97 |  245 |  169 |   59 | 2.76   | 2.66 |
| Garrett Richards  | LAA  |   678 |  164 |   58 |    5 |   84 |  236 |  107 |   24 | 2.78   | 2.61 |
| Collin McHugh     | HOU  |   619 |  157 |   47 |   13 |   81 |  187 |   97 |   37 | 2.79   | 2.73 |
| Madison Bumgarner | SFG  |   872 |  219 |   49 |   21 |  121 |  281 |  141 |   40 | 2.82   | 2.98 |
| Johnny Cueto      | CIN  |   961 |  242 |   80 |   22 |  104 |  306 |  148 |   59 | 2.89   | 2.25 |
| Alex Wood         | ATL  |   694 |  170 |   51 |   16 |  104 |  227 |   96 |   30 | 3.04   | 2.78 |

What's good for the pitcher is good for the batter. 7fFIP can be applied to batting plate appearances. It measures how hard the batter hits the ball, regardless of its outcome. The top twenty MLB hitters in 7fFIP are:

| name              | team | plays | K    | BB   | HR   | FB   | GB   | LD   | PU   | 7fFIP  | ops   |
+-------------------+------+-------+------+------+------+------+------+------+------+--------+-------+
| Victor Martinez   | DET  |   636 |   40 |   72 |   32 |  128 |  220 |  121 |   23 | 7.86   | 0.974 |
| Michael Brantley  | CLE  |   676 |   56 |   60 |   20 |   98 |  267 |  161 |   14 | 6.69   | 0.890 |
| Jose Bautista     | TOR  |   673 |   95 |  113 |   35 |   95 |  192 |   89 |   54 | 6.64   | 0.928 |
| David Ortiz       | BOS  |   602 |   95 |   78 |   35 |  114 |  165 |   89 |   26 | 6.57   | 0.873 |
| Buster Posey      | SFG  |   605 |   69 |   50 |   22 |  106 |  209 |  134 |   15 | 6.34   | 0.854 |
| Edwin Encarnacion | TOR  |   535 |   81 |   64 |   34 |   89 |  142 |   78 |   47 | 6.33   | 0.901 |
| Adam LaRoche      | WSN  |   586 |  108 |   84 |   26 |   90 |  148 |  111 |   19 | 6.06   | 0.817 |
| Albert Pujols     | LAA  |   695 |   71 |   53 |   28 |  110 |  269 |  123 |   41 | 6.03   | 0.790 |
| Robinson Cano     | SEA  |   665 |   68 |   67 |   14 |   76 |  284 |  139 |   17 | 5.97   | 0.836 |
| Adrian Beltre     | TEX  |   617 |   75 |   60 |   19 |  106 |  207 |  122 |   28 | 5.93   | 0.879 |
| Anthony Rizzo     | CHC  |   616 |  116 |   88 |   32 |   80 |  155 |  108 |   37 | 5.91   | 0.913 |
| Jonathan Lucroy   | MIL  |   655 |   71 |   68 |   13 |  100 |  220 |  148 |   35 | 5.88   | 0.837 |
| Justin Morneau    | COL  |   550 |   60 |   40 |   17 |   69 |  200 |  133 |   31 | 5.82   | 0.860 |
| Matt Holliday     | STL  |   667 |  100 |   91 |   20 |   96 |  218 |  112 |   30 | 5.75   | 0.811 |
| Andrew McCutchen  | PIT  |   649 |  115 |   95 |   24 |   95 |  179 |  111 |   30 | 5.73   | 0.952 |
| Neil Walker       | PIT  |   571 |   88 |   56 |   23 |   95 |  171 |  107 |   31 | 5.64   | 0.809 |
| Matt Carpenter    | STL  |   706 |  110 |  103 |    8 |  111 |  210 |  149 |   15 | 5.58   | 0.750 |
| Coco Crisp        | OAK  |   535 |   66 |   66 |    9 |   89 |  166 |  103 |   36 | 5.56   | 0.699 |
| Adrian Gonzalez   | LAD  |   660 |  112 |   58 |   27 |  109 |  193 |  138 |   23 | 5.53   | 0.817 |
| Melky Cabrera     | TOR  |   621 |   68 |   46 |   16 |   90 |  250 |  122 |   29 | 5.53   | 0.808 |

Figure 2 - Team fielding - add ro7fFIP
Click on above link for expanded chart for league

Back to fielding, call the runs over the seven factor FIP ro7fFIP

     ro7fFIP = ARA – 7fFIP                       (7)

This is similar to roFIP but with the intent that 7fFIP incorporates more of the pitching contribution to defense by accounting for the type of batted ball hit. Figure 2 show the chart of ro7FIP for each team with the league regression line.

The ro7fFIP line has a positive slope but not nearly as steep as the roFIP line. The values of ro7fFIP range from -0.2 for the best fielding teams to +0.2 runs per game for the worst.

The ro7fFIP line falls just about on top of the TZFR and DRS lines, having similar slopes and all crossing the x axis near the center of the line.

ro7fFIP is useful for rating team fielding performance, but it cannot allocate the team performance among the individual fielders on the team. For this, the inning state transition method is used, extending it for the factors used 7fFIP. The classic inning state run (ISR) considers the beginning and end inning states for each plate appearance and subtracts the expected runs scored from each state to come up with an ISR for that PA.

The 7fEvent groups all plays with same start inning state, the same 7fFIP factor (GB, FB, HR, …) and the same location 2 where the ball was hit in the case of ground balls, fly balls, and line drives. For each of these event groups we measure the average measured ISR. Lets abbreviate the average measured ISR for a 7f event group as ξ. 3 So

     7fEvent = f(startInningState, 7fFIP factor, location)                  (8)

     ξ(7fEvent) = avg(isr for all plays belonging to the 7fEvent group)     (9)
Figure 3 - Team fielding - add 7fFIPisr
Click on above link for expanded chart for league

For each play this average ISR, ξ(7fEvent), is allocated to the pitcher. Then for each play the residual ISR for the play is computed from the actual play ISR minus the ξ(7fEvent) of the play. This residual ISR is awarded (or charged) to the fielder. In the case of balls hit in the hole between fielders both fielders are charged one half of the residual ISR for the play. I'm going to abbreviate these residual ISRs as 7fFIPisrs. So, for each play,

     7fFIPisr = isr – ξ(7fEvent)                        (10)

The 7fFIPisr rates for each team, along with the regression line for the league, is shown on Figure 3.

This line is very similar the ro7fFIP, TZFR & DRS lines. It has a positive slope similar to the three other lines, and has a midpoint around zero runs allowed.

Applying the analysis to individuals, the top 10 fielders for 2014 are:

| name           | team | position | po   | a    | e    | DP   | Rtot | Rdrs | 7fFIPisr |
+----------------+------+----------+------+------+------+------+------+------+----------+
| Jason Heyward  | ATL  | RF       |  365 |    9 |    1 |    2 |   30 |   32 |    29.71 |
| Juan Uribe     | LAD  | 3B       |   60 |  215 |    6 |   25 |   10 |   17 |    27.14 |
| Josh Donaldson | OAK  | 3B       |  131 |  328 |   23 |   43 |   22 |   20 |   26.775 |
| Alex Gordon    | KCR  | LF       |  341 |    8 |    2 |    0 |   25 |   27 |   25.695 |
| Nolan Arenado  | COL  | 3B       |   69 |  280 |   15 |   31 |    8 |   16 |    24.13 |
| Chase Headley  | NYY  | 3B       |   77 |  234 |    8 |   25 |   17 |   13 |   23.815 |
| Kyle Seager    | SEA  | 3B       |   87 |  327 |    8 |   36 |   23 |   10 |    22.57 |
| Ender Inciarte | ARI  | CF       |  198 |    8 |    2 |    0 |    4 |   15 |    22.21 |
| Anthony Rendon | WSN  | 3B       |  106 |  235 |   15 |   30 |   10 |   12 |   20.855 |
| Billy Hamilton | CIN  | CF       |  342 |   10 |    2 |    3 |   14 |   14 |   20.105 |

Rtot (TZFR), Rdrs (DRS), and 7fFIPisr all like Jason Heyward. Pretty good agreement between Rtot, Rdrs, and 7fFIPisr. Strange majority of third basemen and no middle infielders. Have to look into that.

In part because we all like to rubberneck at the car crash, here are the ten worst fielders for 2014:

| name             | team | position | po   | a    | e    | DP   | Rtot | Rdrs | 7fFIPisr |
+------------------+------+----------+------+------+------+------+------+------+----------+
| Shin-Soo Choo    | TEX  | LF       |  100 |    3 |    3 |    0 |  -16 |   -9 |  -25.645 |
| Dexter Fowler    | HOU  | CF       |  238 |    4 |    5 |    1 |  -17 |  -20 |   -25.29 |
| Danny Santana    | MIN  | CF       |  167 |    5 |    4 |    0 |   -4 |    0 |   -24.59 |
| Asdrubal Cabrera | WSN  | 2B       |   73 |  132 |    1 |   29 |    0 |  -10 |   -21.73 |
| Colby Rasmus     | TOR  | CF       |  234 |    3 |    1 |    0 |   -7 |   -7 |  -21.015 |
| Ben Revere       | PHI  | CF       |  323 |    2 |    4 |    2 |   -6 |  -18 |  -20.325 |
| Grady Sizemore   | PHI  | LF       |   82 |    2 |    1 |    0 |    4 |    2 |   -20.14 |
| Yunel Escobar    | TBR  | SS       |  168 |  267 |   16 |   50 |  -17 |  -24 |  -19.245 |
| Starlin Castro   | CHC  | SS       |  148 |  386 |   15 |   74 |    0 |   -7 |  -18.895 |
| Matt Kemp        | LAD  | LF       |   60 |    1 |    1 |    0 |   -4 |   -8 |  -18.025 |

Lots of negative numbers with this group. Note how errors do not matter between top and bottom ten. The positions also cluster with this group, lots of outfielders and no third basemen.

The analysis continues: Going back to Eq (1), if 7fFIPisr measures fielding, then it stands that ξ(7fEvent) measures pitching.

The top ten pitchers for 2014 by ξ(7fEvent) are:

+-------------------+------+------------+-------+------+
| name              | team | ξ(7fEvent) | isr   | era  |
+-------------------+------+------------+-------+------+
| Corey Kluber      | CLE  | -40.8      | -32.2 | 2.44 |
| Jon Lester        | OAK  | -40.4      | -25.5 | 2.35 |
| Clayton Kershaw   | LAD  | -38.0      | -48.5 | 1.77 |
| Max Scherzer      | DET  | -33.6      | -20.1 | 3.15 |
| Felix Hernandez   | SEA  | -31.7      | -41.0 | 2.14 |
| Chris Sale        | CHW  | -29.1      | -32.3 | 2.17 |
| David Price       | DET  | -29.1      | -11.8 | 3.59 |
| Dellin Betances   | NYY  | -28.7      | -29.5 | 1.40 |
| Jordan Zimmermann | WSN  | -26.0      | -22.9 | 2.66 |
| Cole Hamels       | PHI  | -26.0      | -32.3 | 2.46 |
+-------------------+------+------------+-------+------+

And the ten worst are:

+------------------+------+------------+------+------+
| name             | team | ξ(7fEvent) | isr  | era  |
+------------------+------+------------+------+------+
| Hector Noesi     | CHW  | 27.4       | 19.4 | 4.39 |
| Franklin Morales | COL  | 27.3       | 26.3 | 5.37 |
| Vidal Nuno       | ARI  | 27.0       | 14.5 | 5.42 |
| Edwin Jackson    | CHC  | 26.5       | 35.9 | 6.33 |
| Kyle Kendrick    | PHI  | 26.4       | 13.7 | 4.61 |
| Travis Wood      | CHC  | 24.0       | 30.4 | 5.03 |
| Colby Lewis      | TEX  | 24.0       | 28.3 | 5.18 |
| Andre Rienzo     | CHW  | 23.6       | 27.9 | 6.82 |
| Nick Tepesch     | TEX  | 21.4       | 6.8  | 4.36 |
| Jerome Williams  | PHI  | 20.7       | 11.9 | 9.90 |
+------------------+------+------------+------+------+

And still the analysis continues: What's good for defense is good for offense. What about applying ξ(7fEvent) to the batter on each play as well as the pitcher.

The top ten batters using this measure are:

+-------------------+------+------------+------+-------+
| name              | team | ξ(7fEvent) | isr  | ops   |
+-------------------+------+------------+------+-------+
| Victor Martinez   | DET  | 64.8       | 55.3 | 0.974 |
| Jose Bautista     | TOR  | 48.0       | 47.6 | 0.928 |
| Devin Mesoraco    | CIN  | 46.7       | 46.3 | 0.893 |
| Edwin Encarnacion | TOR  | 44.0       | 34.6 | 0.901 |
| Michael Brantley  | CLE  | 41.6       | 43.9 | 0.890 |
| David Ortiz       | BOS  | 41.1       | 38.5 | 0.873 |
| Adam LaRoche      | WSN  | 40.9       | 27.6 | 0.817 |
| Miguel Cabrera    | DET  | 38.8       | 50.6 | 0.895 |
| Paul Goldschmidt  | ARI  | 38.8       | 36.1 | 0.938 |
| Adrian Gonzalez   | LAD  | 38.7       | 32.4 | 0.817 |
+-------------------+------+------------+------+-------+

And the ten worst are:

+--------------------+------+------------+-------+-------+
| name               | team | ξ(7fEvent) | isr   | ops   |
+--------------------+------+------------+-------+-------+
| Dee Gordon         | LAD  | -30.5      | -7.9  | 0.704 |
| Jackie Bradley     | BOS  | -27.2      | -20.7 | 0.531 |
| James Jones        | SEA  | -23.1      | -18.5 | 0.589 |
| Jose Molina        | TBR  | -21.7      | -27.5 | 0.417 |
| Jonathan Schoop    | BAL  | -21.2      | -22.4 | 0.598 |
| Adeiny Hechavarria | MIA  | -21.0      | -19.5 | 0.664 |
| Derek Jeter        | NYY  | -20.6      | -14.6 | 0.617 |
| B.J. Upton         | ATL  | -19.3      | -15.8 | 0.620 |
| Andrew Romine      | DET  | -19.3      | -16.4 | 0.554 |
| Leury Garcia       | CHW  | -18.7      | -21.0 | 0.399 |
+--------------------+------+------------+-------+-------+

I'll go one last step further: if offensive ξ(7fEvent) measures batting, then the residual offensive isr off of ξ(7fEvent) measures running. There's a certain symmetry to all this.

The ten best runners in 2014 are:

+------------------+------+------+----------+------------+------+------+------+------+
| name             | team | pos  | 7fFIPisr | ξ(7fEvent) | isr  | 3B   | SB   | CS   |
+------------------+------+------+----------+------------+------+------+------+------+
| Adam Eaton       | CHW  | CF   | -29.8    | -8.1       | 21.7 |   10 |   15 |    9 |
| Mike Trout       | LAA  | CF   | -25.4    | 35.5       | 61.0 |    9 |   16 |    2 |
| Starling Marte   | PIT  | LF   | -23.2    | -5.4       | 17.8 |    6 |   30 |   11 |
| Danny Santana    | MIN  | CF   | -23.1    | -8.4       | 14.7 |    7 |   20 |    4 |
| Yasiel Puig      | LAD  | RF   | -22.8    | 7.6        | 30.4 |    9 |   11 |    7 |
| Dee Gordon       | LAD  | 2B   | -22.6    | -30.5      | -7.9 |   12 |   64 |   19 |
| Dexter Fowler    | HOU  | CF   | -21.1    | -4.8       | 16.3 |    4 |   11 |    4 |
| Howie Kendrick   | LAA  | 2B   | -20.8    | 5.4        | 26.2 |    5 |   14 |    5 |
| Christian Yelich | MIA  | LF   | -19.8    | 2.9        | 22.8 |    6 |   21 |    7 |
| Hunter Pence     | SFG  | RF   | -18.7    | 11.4       | 30.1 |   10 |   13 |    6 |
+------------------+------+------+----------+------------+------+------+------+------+

I like the way Dee Gordan shows up on the worst hitter list and on the best runner list. Dee beats out a lot of weak grounders for hits, which raises his poor ξ(7fEvent) to a more respectable overall ISR.

And the worst runners:

+------------------+------+------+----------+------------+-------+------+------+------+
| name             | team | pos  | 7fFIPisr | ξ(7fEvent) | isr   | 3B   | SB   | CS   |
+------------------+------+------+----------+------------+-------+------+------+------+
| Matt Dominguez   | HOU  | 3B   | 18.1     | -13.1      | -31.1 |    0 |    0 |    1 |
| Brian McCann     | NYY  | C    | 16.7     | 23.2       | 6.5   |    1 |    0 |    0 |
| Mark Reynolds    | MIL  | 1B   | 15.5     | 4.4        | -11.1 |    0 |    5 |    1 |
| Carlos Ruiz      | PHI  | C    | 15.0     | 14.7       | -0.3  |    1 |    4 |    2 |
| Yadier Molina    | STL  | C    | 14.5     | 5.2        | -9.2  |    0 |    1 |    1 |
| Nolan Arenado    | COL  | 3B   | 14.3     | 28.5       | 14.2  |    2 |    2 |    1 |
| Alberto Callaspo | OAK  | DH   | 14.1     | -0.3       | -14.4 |    0 |    0 |    1 |
| Martin Prado     | NYY  | 3B   | 13.4     | 13.3       | -0.1  |    0 |    1 |    0 |
| Aramis Ramirez   | MIL  | 3B   | 13.3     | 11.5       | -1.8  |    1 |    3 |    0 |
| Adam LaRoche     | WSN  | 1B   | 13.2     | 40.9       | 27.6  |    0 |    3 |    0 |
+------------------+------+------+----------+------------+-------+------+------+------+

I acknowledge a lot of flaws in this analysis. The location used is a bit crude. I was noticing on the MLB Gameday web application, they have not only play-by-play that ends up in retrosheet, but also another section called “feed”. It's mostly about people playing on Twitter, but for each batted ball it listed it's speed and how far it travelled. After wondering how the pitch could go faster than the batted ball, I got to thinking how this could be a tremendous improvement on location if they gave the angle of between the baselines the ball trejectory took. Then we could very precisely locate the play, and render a probability that the average fielder would catch up with the ball. Using this technique,

     location = f(distance hit, angle from the home-2B field centerline, time to get to distance)      (11)

Note the location is now expressed as space-time, not just space!

Another area for improvement is that analysis does well with dealing with the primary fielder or runner, but not with the secondary ones. The primary runner is the batter, and primary fielder is the first one to touch the ball after leaving the bat. There are places where the secondary fielders are important, like the second man on the double play, or the pitcher on a stolen base, or the first baseman handling errant throws. Secondary runners who are important include a runner going from first to third on an outfield hit.


1 My attempt to do a multiple regression analysis on the complete set of seven factor using matrix algebra (β = (XTX)-1XTY) resulted in nonsense values. The coefficient for strikeouts was positive.

2 The set of locations used to describe ground balls in 2014 scoresheets are:

+---------------------+-----------------+
| 1B                  |            2525 |
| 2B                  |            8243 |
| 2B-1B               |            7651 |
| 3B                  |            2577 |
| C                   |              21 |
| CF                  |             419 |
| Deep 1B             |               6 |
| Deep 2B             |             130 |
| Deep 2B-1B          |             260 |
| Deep CF             |               5 |
| Deep CF-RF          |               2 |
| Deep LF             |               9 |
| Deep LF Line        |               1 |
| Deep LF-CF          |               4 |
| Deep LF-CF to 3B    |               1 |
| Deep RF             |               1 |
| Deep SS             |              29 |
| Deep SS-2B          |              31 |
| Deep SS-3B Hole     |               1 |
| Front of Home       |            2447 |
| Front of Home to 1B |               1 |
| Front of Home to P  |               3 |
| LF                  |            1671 |
| P                   |            1959 |
| P-s Left            |             375 |
| P-s Right           |             241 |
| RF                  |            1121 |
| Short 1B Line       |             418 |
| Short 3B Line       |             655 |
| Short CF            |               1 |
| Short CF-RF         |               2 |
| Short RF            |              10 |
| Short RF Line       |               1 |
| SS                  |            9843 |
| SS-2B               |            5541 |
| SS-3B Hole          |            7734 |
| Weak 1B             |             307 |
| Weak 2B             |             601 |
| Weak 2B-1B          |            1135 |
| Weak 3B             |            3234 |
| Weak SS             |            1183 |
| Weak SS-2B          |             553 |
+---------------------+-----------------+

The set of locations used to describe fly balls in 2014 scoresheets are:

+-----------------+-----------------+
| CF              |            1892 |
| CF-RF           |            1251 |
| Deep 1B         |              55 |
| Deep 2B         |              45 |
| Deep 2B-1B      |             102 |
| Deep 3B         |              39 |
| Deep CF         |            4718 |
| Deep CF-RF      |            2777 |
| Deep LF         |            1908 |
| Deep LF Line    |             508 |
| Deep LF-CF      |            2827 |
| Deep RF         |            2001 |
| Deep RF Line    |             660 |
| Deep SS         |              53 |
| Deep SS-2B      |              31 |
| Deep SS-3B Hole |             100 |
| LF              |            1246 |
| LF Foul         |             215 |
| LF Line         |             334 |
| LF-CF           |            1225 |
| RF              |            1529 |
| RF Foul         |             252 |
| RF Line         |             438 |
| Short CF        |             818 |
| Short CF-RF     |             567 |
| Short LF        |             810 |
| Short LF Line   |             271 |
| Short LF-CF     |             636 |
| Short RF        |             939 |
| Short RF Line   |             306 |
+-----------------+-----------------+

The set of locations used to describe line drives in 2014 scoresheets are:

+------------------+-----------------+
| 1B               |             172 |
| 1B Foul          |               8 |
| 2B               |             652 |
| 2B-1B            |             467 |
| 3B               |             154 |
| 3B Foul          |               2 |
| CF               |            3262 |
| CF-RF            |            1526 |
| Deep 1B          |              43 |
| Deep 2B          |              87 |
| Deep 2B-1B       |             152 |
| Deep 3B          |              40 |
| Deep CF          |            2426 |
| Deep CF-RF       |            1998 |
| Deep LF          |            1295 |
| Deep LF Line     |            1145 |
| Deep LF-CF       |            2037 |
| Deep RF          |            1160 |
| Deep RF Line     |            1119 |
| Deep SS          |              91 |
| Deep SS-2B       |              62 |
| Deep SS-3B Hole  |              98 |
| Front of Home    |              30 |
| LF               |            1917 |
| LF Foul          |               2 |
| LF Line          |             470 |
| LF-CF            |            1670 |
| P                |             199 |
| P-s Left         |              10 |
| P-s Right        |               8 |
| RF               |            2044 |
| RF Foul          |               3 |
| RF Line          |             517 |
| Short 3B Line    |               3 |
| Short CF         |            1001 |
| Short CF-RF      |             736 |
| Short LF         |            1316 |
| Short LF Line    |             233 |
| Short LF-CF      |             831 |
| Short RF         |            1421 |
| Short RF Line    |             253 |
| SS               |             691 |
| SS-2B            |             127 |
| SS-2B to P       |               1 |
| SS-3B Hole       |             491 |
| SS-3B Hole to SS |               1 |
| Weak 1B          |              10 |
| Weak 2B          |               7 |
| Weak 2B-1B       |              34 |
| Weak 3B          |             105 |
| Weak SS          |              23 |
| Weak SS-2B       |              23 |
+------------------+-----------------+

3 The symbol ξ is the Greek letter xi. In engineering school it was always impressive to have a blackboard full of squiggles, particularly when you weren't sure if the answer was seven. Also, used by Baseball Reference, as a playful way to provide a random page.