Tuesday, March 19, 2013


alt NCAA tournament

I'm not one of those people who think the NCAA tournament needs to be bigger; fairness (defined as "maximizing the likelihood that a top team wins the tournament") would dictate that it be smaller.  I understand, though, that people like the spectacle of all those games, and indeed the "automatic bids" that go to the champions of conferences seem to be popular, especially when they result in upsets.  The thing is, even with 31 automatic bids, that leaves 17 "at large" bids; since 9 of the top 26 teams this year were their conference champions, that means even a 48 team tournament can include all the automatic bids as well as any team that has a realistic claim on being the best team in the country.

One way to do a 48-team tournament is the way it was done 30 years ago; the bottom 32 teams play an extra round in a single-elimination tournament.  Having previously presented a scheme for a 12-team sesqui-elimination tournament, I propose that that scheme be used for each of the four regions, at least for the first four rounds; the top four seeds from each region would play a single-elimination tournament the first weekend, with the "champion" of that bracket seeded into the final 8, while the other three teams are dropped into a single-elimination bracket with the other 8 teams, and would have to make it through a second weekend undefeated in order to make it into the final 8.
#1 Indiana
#4 Syracuse
#2 Miami
#3 Marquette
#12 James Madison
#6 Butler
#11 Long Island
#7 Belmont
#10 Harvard
#8 Bucknell
#9 Davidson
#1 Louisville
#4 Saint Louis
#2 Duke
#3 Michigan State
#5 Oklahoma State
#12 North Carolina A&T
#6 Memphis
#11 Albany
#7 Creighton
#10 Northwestern State
#8 New Mexico State
#9 Valparaiso
#1 Kansas
#4 Michigan
#2 Georgetown
#3 Florida
#5 VCU
#12 Liberty
#11 Western Kentucky
#7 San Diego State
#10 Florida Gulf Coast
#8 Akron
#9 South Dakota State
#1 Gonzaga
#4 Kansas State
#2 Ohio State
#3 New Mexico
#5 Wisconsin
#12 Southern
#6 Arizona
#11 Iona
#7 Oregon
#10 Pacific
#8 Mississippi
#9 Montana

Thursday, March 14, 2013


geometry of child-herding

My son is walking, and running, and exhibits an independent streak. I can still run faster than he can, and, while I like to give him freedom to roam, I always want to make sure I'm positioned such that I can outrun him to the street.

He, in the above diagram, is C, and the vertical line is the curb.  The larger ellipse (or semi-ellipse) has him at one focus, with the curb forming its minor axis.  This particular ellipse is drawn supposing I can run twice as fast as Calvin can; a larger ratio results in a bigger, less eccentric (more circular) shape, while a ratio close to 1 would largely contain a lane between him and the curb, but 2 seems like about the right ratio, and is a good one for illustrative purposes.  Thus the rule is that I have to stay within the ellipse; as long as I do, there is no point along the vertical line to which he can out-race me.  As he moves around, I can envision the ellipse moving with him, shrinking when he moves toward the curb and expanding when he moves away from it.

I've placed myself (D) just inside the ellipse.  If he suddenly decides he wants to maximize his chances of making it to the street, and I want to maximize my chances of catching him — suppose we're not quite certain of the 2:1 ratio — then we're both racing toward the same spot along the curb.  This point (T) can be constructed by drawing the line that runs equidistant between us; that line intersects the curb at a single point, such that points on the curb to one side of that intersection are closer to me than to him, and vice versa.  A circle can be drawn with this as its center and both of us along its arc; T is where the circle intersects the curb closer to him than to me.  If I started exactly on the ellipse, and we both started running toward T at the same time, I would catch him exactly at the curb; if he runs in any other direction and I start running toward T, then I will be inside the new ellipse, while if he runs straight toward T and I run in any other direction, I will be outside the new ellipse.  Note, however, that T was constructed without regard to the ellipse; it was constructed based only on his location, my location, and the curb.  If I'm suddenly unsure about the ratio of our speeds and whether I'm in the proper ellipse or not, the most conservative thing I can do is head toward the T (assuming, of course, that this doesn't affect his behavior).

Sunday, April 18, 2010

The Dormouse and the Doctor, from one of the When We Are Very Six books by A. A. Milne. The story of experts inflicting personal preferences on others in the name of their expertise is familiar.

Saturday, January 17, 2009

When a team prepares to punt, the punter's statistics are often cited, typically the average length of his punts and the number of times he has left teams behind their own 20 yard line. These seem like kind of strange statistics to me; if I were to take the line of scrimmage and the end position of the ball and plot one against the other, what I would likely expect to see, as a first approximation, would be a 45 degree line up to a point, and then a horizontal line from there on out. Behind a certain point on the field, a punter would be expected to net a certain length; ahead of that, he would be expected to average a certain level of field position. Grabbing every punt the Packers made that year, I found that the break-point from a least squares fit was very near midfield. Accordingly, it seems to me we ought to characterize the net length of punts from one's own half of the field, and the average final field position for punts from the fifty yard line and beyond.

Taking the data from The Football Project for 2005, I calculated these statistics for each player who punted. Every player who punted more than twice had at least one punt from each half of the field, so the figures for them are well defined. Remember, the "length" is only calculated for those punts from the punter's own end of the field; the "depth", the name of which is probably more poetically than logically motivated, is the average ensuing field position of the receiving team after punts from the fifty and beyond. I use results net of the return, though using results before the return leaves a lot of what follows more or less unchanged. The players are ordered by length-depth/4, due to the fact that about 4/5 of punts originated from the punting team's side of the fifty.
punterlengthdepthnumber of punts
Number one is Brian Moorman, of the Buffalo Bills; second is Donnie Jones. They are the only two punters to average more than 40 net yards from their own end of the field; of punters who punted more than twice, the two who left the ball inside the ten yard line when they punted from midfield or closer were Ben Graham, who had pretty good length as well, and Nick Harris, whose length was more mediocre.

Adding the length and depth for each player with more than two punts, I get a surprisingly narrow distribution. It is centered around 51.4 or 51.5 — 50.5 would be ideal for the use of these statistics — and has a standard deviation of only 3.5 yards. Most punters, then, seem to punt for distance behind their own 49 or so, and for field position beyond there. If I exclude Ryan Flinn, who had six punts (the fewest among those with more than two) for the worst result in both statistics (among those with more than two punts), the correlation between length and depth is 0 to two decimals.* Accordingly, a punter with better length will tend to be affected by the endzone further into his own territory, while a punter who is particularly good at pinning the opposing team against its goal line is more likely to still be punting for length a bit beyond the fifty; there is no unambiguous connection, independent of one's measure of "skill", between a punter's "breakpoint" and the skill of the punter.

It won't come as a great surprise that the length as I measure it and the average length of all punts has a correlation greater than 0.9. It might not be a big surprise either that the percentage of punts to end up inside the twenty has a correlation of -0.4 with "depth", but, interestingly, either length measurement has a correlation of 0.4 with the inside-the-twenty statistic. From a linear regression standpoint, it looks as though the inside-the-twenty statistic is including some length information; 1/3 of the variance can be explained from the two numbers in my table. The median punt to end up inside the 20 starts from 2 yards behind midfield, but 20% come from behind the punter's own 40; some of what is being recorded in that figure is not any deftness in terms of avoiding the touchback or letting one's teammates get downfield, but is simply the ability to kick to the red zone from farther away. This is a nice skill, of course, but it is fully incorporated into the length statistic; the frequency of leaving a punt inside the twenty is a hybrid of skills, and is not the best measure for any of them.

There is some attempt here to keep the statistics simple. In fact, this line is slightly flatter than 45 degrees because the endpoint is bounded both above and below; punts from behind midfield give a slope of 0.95 that is statistically distinct from 1 at the 5% confidence level.

* This actually is less true without the return; punters who punt the ball farther before the return also tend to punt it closer to the endzone, but not dramatically so. The distribution of punters' depth+length is similar to the results with the return, with several yards simply moved from depth to length.

One of the things I often wondered before discovering The Football Project was how the probability of a kicker making a field goal varied as a function of distance. After eyeballing the distributions for a few kickers for the 2005 season, I figured I could try raising a logistic function to some power. For the first several kickers I tried, I found that that power was statistically indistinguishable from 1, so I set about fitting the probabilities to a simple logistic function, i.e. (1/2)(1+tanh((m-x)/w)).

I had imagined, in the absence of data, that w might be independent of the kicker, and that kickers could be characterized by m, i.e. how far away they are when their percentages drop. This is not the case; w depends on the kicker, with larger values to kickers who tend to miss easy ones and make longer ones, with lower values to more consistent kickers. Olindo Mare missed a few short ones, so his percentages didn't drop off very quickly. Matt Bryant actually had a slight improvement as distances got longer; this would surely change if more statistics were taken at a normal range of distances. On the other hand, John Kasay had a much higher tendency to hit field goals shorter than 50 than if they were longer than 50; of the 8 he missed, the shortest was 42 (he made 24 shorter than that). Jeff Reed had an even sharper drop around 45 yards, missing nothing shorter than 41 and making nothing longer than 47.

While I was unable to fairly characterize the best kicker in terms of a drop-off length, I was able to generate a different metric that adjusts for length. By using my logistic fits, I predicted the percentage of field goals a kicker would make if they kicked from a given distance; I then took the 1006 field goal attempts for the season and calculated the percentage of those 1006 field goals that each kicker would have made. I've only included those kickers who attempted more than 4 kicks; the kickers who were dropped were all notably worse than the ones listed.
kickernormalized scorepercentagenumber of kicks
This obviously does not adjust for wind, and the linemen on both the kicking and defending sides will have some influence on these statistics, but this at least tells which unit is doing better than which other with the confounding variable of distance removed. The average length for a field goal attempt was 36.3 yards; the average for Nick Novak was 33.7, while for Josh Brown it was 41.2. Accordingly the "scores" for these kickers find themselves lower and higher, respectively, than the raw percentage. The scores and the actual percentages have a corelation of 0.8. The means and variances are very similar, though the variance of the raw percentages is a little bit smaller; while the difference isn't statistically significant*, it is what would be expected from coaches deciding to attempt longer field goals with better kickers, and punting or going for the first down with worse kickers. Perhaps looking at all fourth down plays from around the thirty yard line would be a good step for further research.

This isn't a least-squares fit; I try to maximize the sum of the logarithm of the fitted probability of the actual outcome: for kicks that the kicker makes, P is the fitted probability that the kicker would make the kick, while for those the kicker missed (or were blocked or whatever), it is the fitted probability that the kicker would miss the kick.

* It would be significant at the 25% confidence level on a two-tailed test; arguably a one-tailed test could be used here, but even that isn't going to pass a common significance test.

Friday, July 25, 2008

When I was a computer programmer, I had a coworker who often preferred very different kinds of solutions to problems than I did. If I proposed a design of some sort that was different from what he would typically do, he would frequently use the word "just" (in the sense of "merely" or "simply") as, it felt to me, something of a cudgel; he would be saying something that sounded to my ear like "Why would you take Tylenol for a headache when you could just self-administer brain surgery?" I don't know whether it was an attempt on his part, conscious or unconscious, to make me think that, well, yeah, brain surgery does sound simpler, but it always sounded clumsily tendentious to me.

In more subtle contexts, though, such verbal patterns do affect our thinking, and often our own language places constraints on our own thoughts. As George Orwell noted, sloppy language is often indicative of sloppy thinking, and wordplay is sometimes used as a substitute for thinking or argument. Oratory involving slippery homonyms usually strikes me as disingenuous, but it may often be that the speaker has confused himself before offering his confusion to others; he's not making a bad argument in an attempt to persuade people of something he believes for other reasons, but is instead sharing his own sloppy reasoning processes.

Homonyms are, I think, one of the more prolific sources of this kind of confusion, but consider how much damage can be done by the simple (and ubiquitous) word "the". Prepending a noun-phrase with the definite article can imply both existence and uniqueness in a way that the listener often won't even bother to question. Consider a famous "paradox" from mathematical set theory: does "the" set of sets that do not contain themselves contain itself? If I mention "the set that contains 2 and 5", the average listener, without even realizing it, will mentally add the restriction "and contains no other elements" to make the set unique; if I say "the set that contains 2 and 5 but not 4", the listener will be thrown off by the explicit restriction, and may realize that, but for the "the", there's no reason to believe that this set has been fully specified. As for existence, a more explicit contradiction — "the set that contains 4 but not 4" — might engage a similar awareness, but a broader description of a set ("the set that contains all sets that do not contain themselves, and has no other elements") seems to make people feel as though there ought to be one and only one set meeting that description, and that one's inability to answer binary questions about it (does it contain itself?) is a paradox. The careless listener assumes the question should have the answer "yes" or the answer "no", when in fact it is simply ill-posed; the phrase "the set that contains all sets that do not contain themselves, and has no other elements", like "green day" or "too much garlic", obeys all the rules of syntax but corresponds to no actual or even imaginable construct.

This, in fact, is a slightly broader phenomenon; it is the "will you stop beating your wife?" question. Putting an assumption at the heart of a question can evade the detection of the assumption; the listener (and perhaps speaker) will be busy setting off to answer the question when the question itself actually makes no sense. Listeners are used, perhaps, to looking for false statements; when a statement isn't even coherent enough to be wrong, they may accept whatever is necessary to fit it into their expectations. Listeners should be more careful about analyzing what they hear — and speakers, at least those who intend good faith, should be careful about constructing what they say.

Thursday, July 17, 2008

I've been thinking lately about a comment by physicist Richard Feynman that a good physicist should be able to work out a physics problem in several different ways. The same is true of economics; if we have different tools for analyzing problems then, to the extent that they're all correct, they should get the same answer to the same question. An example is the effect on a nation that is a large importer of a good placing an import tariff on that good.

One way to view this is initially to view the nation as a single entity, and to look at it as a monopsonist, or at least a market-moving buyer on the world stage. To optimize its own interests, it should reduce its purchases below what it would buy if it were a price-taker, thereby lowering the price on the units it does purchase. Efficient allocation of the reduced purchase among residents of the country should, for the usual reasons, be acheived by allowing them to trade at a single price within the country; the artificial reduction of quantity imported will increase the domestic price while reducing the world price, and the optimal tariff, from this standpoint, is the difference between the domestic price and the world price at the optimal consumption level.

Insofar as the country consists not of a unitary actor, perhaps this is better thought of as a buyers' cartel, but, to the extent that it's able to enforce internal cooperation, the external economics look the same. It is in the interest of each member of the cartel to cheat -- to buy more of the good at the world price, rather than the domestic price. As each individual does so, though, they bid up the price faced by everyone else, reducing the welfare of their fellow citizens by more than they increase their own welfare.

This gets us to a second way of viewing the same problem, in terms of pecuniary externalities. More buyers or sellers in a market may move the price up or down, but they won't have an effect on overall Marshallian welfare; they simply transfer it back and forth between buyers and sellers. As I've constructed this situation, though, we don't ascribe any value to the welfare of foreigners, who are net sellers, only to those of our fellow citizens, who are net buyers; a purchase, then, by placing upward pressure on the price, represents a welfare transfer away from our fellow citizens. An optimal Pigovian tax would impose this externality on the purchaser in the amount that it would fall, on net, on his fellow citizens; where the world price differs from the domestic price by the amount an additional unit purchased is likely to cost the fellow citizens in increased costs, the buyers will find their equilibrium, and it should be at the same optimal level inferred from the monopsony argument.

Of course, if we valued foreigners' welfare equally to that of domestic citizens, there would be no externality to tax; that Pigovian tax, to first order, represents welfare that would otherwise be gained by foreigners from the additional unit purchased. The tax is economically incident, in part, on the foreigners, and this offers a third treatment of the problem: we wish to impose a tax such that the amount of revenue effectively derived from the foreign exporters from a marginally higher or lower tax would be offset by further welfare losses associated more directly with the lower domestic use of the good at higher prices. This is another standard paradigm into which the problem can be put and, yet again, it should yield the same result. This is the paradigm that makes it most easily apparent, though, that it is also in the interest of a large net exporter of a good to tax that good -- driving up world prices, with the tax falling partly on foreigners -- rather than to try to subsidize it, as is more often what mercantilist impulses seem to lead nations to implement.

Note that this is all without regard to any other Pigovian taxes one might impose on the product for other externalities; if consumers of the good, besides bidding up prices and effecting a transfer of wealth out of the country, also impose other negative externalities on their fellow citizens, even higher Pigovian taxes would be justified. The arguments above do not suppose such externalities, and are independent of them.

This is all under the ceteris paribus assumption, and the assumption that the welfare of the exporters is to be ignored. If a tariff is likely to lead to a trade war, that could well cost more than the net benefit of the tariff; if, conversely, a free trade regime can be negotiated and all parties are likely to adhere to it, that is likely to improve welfare for each country more than if each country separately starts taxing trade in attempts to optimize its own welfare by itself. On the other hand, if many of the exporters of a particular good are actually using proceeds from the sales to actively harm a country's interest, so that the importing country might view the exporters' economic welfare as negative, then the arguments apply all the more strongly.

Saturday, February 10, 2007

This post intentionally left blank.

This page is powered by Blogger. Isn't yours?