Thursday, September 13, 2007

Build-Your-Own NBA Draft Kit, Part 3: Laws of Averages

When we last left off, we had imported all the raw data for every player in the NBA into an Excel spreadsheet, and eliminated (usually) unnecessary categories such as PF, Ejections, and Games Started. What you should have is something that looks like this:


Before we compute per-game averages for each player and category, allow me a word as to why I use averages instead of totals. Simply put, averages are a much better way of gauging a player's impact than totals. If a player gets suspended or injured, that will of course hurt their categories from a totals point of view, while the effect on averages will be minimal. Now, injury is of course something that needs to be considered when assembling a team (see the later installment "The Human Element," but it's easier to remember which players were injured and bump then down than to scour the entire list, flagging players who missed a lot of games, then move them up drastically in your rankings. For example, if Star Player X missed 60 games due to a freak broken leg, he would have a terrible ranking based on totals, but by averages he should be right in line with previous years. Comparing by averages doesn't harm this player for his freak injury. Of course, I will revisit the topic of injuries later.

Now that I have made the decision to use averages instead of totals in our comparisons, let's get on with it. For "counting" categories like points, rebounds, and steals, quite obviously you need to divide that category by the number of games played by that player. For instance, to calculate Shareef Abdur-Rahim's average field goals per game, you would divide cell C2 by cell B2 (in the above picture). How do we do this?

First, we need to insert a column to the right of "FGM." Right click on the "D" column (or whichever column corresponds to FTM), then choose "insert" and Excel will magically insert an empty column between what were previously columns C and D.

Second, in cell D1 type "FGM/G" or some other shorthand to indicate what will lie in the column below. Resize the column if necessary.

Third, in cell D2 type "=C2/$B2" and hit enter (don't use the quotation marks, please). This will show a value of "0.5" in cell D2.

Fourth, left click on cell D2 and hold, drag down to the end of the spreadsheet (cell D452), then release the left mouse button and use Ctrl+D on your keyboard. This does what is known as "fill down," which can also be selected by going to Edit->Fill->Down. A number should appear in every cell in that column.

Notice that the cells have varying decimal places (i.e. some say 3.5, others 4.7344)? This can be corrected. Your column should still be highlighted. If it's not, click on the "D" at the top of the column. Right click, go to "Format Cells," choose the "Number" tab and the "Number" category, then "3" under "Decimal Places." You can of course use 2 or 1 or 4 if you wish, but I think 3 gives a good amount of certainty without flooding your spreadsheet with numbers.

This procedure can be used for all the "counting" categories (in my case, FGM/G, FTM/G, 3PTM/G, PPG, RPG, APG, SPG, and BPG). For categories like A/TO ratio and percentages, you need to do the same as above, but instead of automatically dividing by your "C" column (games played), you divide by FGA, FTA, or TO. For example, to calculate FT% you divide the FTM cell by the FTA cell. 4 decimal places should work just fine for these ratio categories.

If your column has a bunch of "######" cells in it, it means that your column is not wide enough to display the digits. Just double click on the cell border at the top of the column (i.e. next to the letter D) to widen the column.

One more thing; if a cell reads "DIV/O" it means that that player did not attempt a field goal (or free throw) during the season. For example, Kaniel Dickens (whoever the heck that is). Feel free to just delete these players right now to save you the hassle. Right click on the row number (i.e. "22") and choose "Delete." Alternately, if you wish to keep these players in the spreadsheet, you can chance the formula for that specific cell to "=IF(J110=0,0,J110/I110)" which states that if the player's FTA is 0, set the cell to 0, otherwise calculate it as normal.

When you are done, your spreadsheet should look something like this (this is a default 9 cat roto league setup):






As I promised last time, a brief word about what metrics we are going to use to compare players. For points leagues, going from here is really easy. You can just figure out how many points your player would get per game using your league's settings, then make a ranking of them. But for most leagues, it's just not that simple. For example, suppose we have 2 players, player A and player B. They are in every category identical, except for two, points and blocked shots. Player A averages 16 points per game, and player B averages 8 ppg. However, player A averages 1.5 blocked shots per game, while player B averages 3 bpg. A quick thought might tell you that the players are equal, because while player A is twice as good in one category, he's also half as good in another category (all other categories are equal). However, a few moments' thought beyond this would also instruct you that this is just not true. Blocked shots (primarily, but other categories like steals and three pointers) are not worth the same as others (points, rebounds). Having a guy block another 1.5 shots per game easily outweighs his deficiency in scoring. That is, not all categories are created equal; some are more scarce (and therefore more valuable).

So how do we compare players? There are several ways that I have found.

1. Something similar to VORP (value over replacement player) in baseball. Basically, you estimate the "average" player who would be on your waiver wire, and determine how much better (or worse) a player is compared to the average wire player.

2. Take a league average of per-game stats, and compare players to the league average.

3. Compare players to the league leaders in each category.

I will share my preferred method with you next time, and we will get into the real nitty gritty of working up these numbers.

Until then,
-M