Our school had a candy guessing contest for Hallowe’en. There were three Jars of Unusual Shape, and various sizes.
Jar 1: 108 Candies
Jar 2: 141 Candies
Jar 3: 259 Candies
The spirit of Francis Galton demanded that I look at the data. Candy guessing, like measuring temperature, is a classic case where averaging multiple readings from different sensors is expected to do very well. Was the crowd wise? Yes.
- The unweighted average beat:
- 67% of guessers on Jar 1
- 78% of guessers on Jar 2
- 97% of guessers on Jar 3, and
- 97% of guessers overall
- The median beat:
- 89% of guessers on Jar 1
- 83% of guessers on Jar 2
- 78% of guessers on Jar 3, and
- 97% of guessers overall
Only one person beat the unweighted average, and two others were close. There were 36 people who guessed all three jars (and one anonymous blogger who guessed only one jar and was excluded from the analysis). The top guesser had an overall error of 9%, while the unweighted average had an overall error of 11%. Two other guessers came close, with an average error of 12%. The worst guessers had overall error rates greater than 100%, with the worst being 193% too high.
The unweighted average was never the best on a single jar — though on Jar 3 it was only off by 1. (The guesser on Jar 3 was exactly correct.)
The measure I used was the overall Average Absolute %Error. The individual rankings change slightly If instead we use Absolute Average %Error, but the main result holds.
In time for Hallowe’en, we’ve added Shadow Forecasts and other features to help show the awesome power of combo.
- Shadowy: when linked forecasts affect a question, we show “Shadow Forecasts” in the history and the trend graph.
- Pointy: the trend graph now shows one point per forecast instead of just the nightly snapshot.
- Chatty: Comment while you forecast. (But not while you drive.)
Read on for news about upcoming prizes.
Tonight’s release streamlines combo trades, adds some per-question rank feedback, prettifies resolutions, and disables recurring edits.
We’ve redone the approach to trading linked questions. Now if the question is linked to other questions, you can make any desired assumptions right from the main trade screen.
The following shows an example of a Scaled or Continuous question:
Instead of estimating the chance of a particular outcome, you are asked to forecast the outcome in natural units like $. Forecasts moving the estimate towards the actual outcome will be rewarded. Those moving it away will be penalized. As with probability questions, moving toward the extremes is progressively more expensive: we have merely rescaled the usual 0%-100% range and customized the interface.
Forecasters frequently want to know why their forecast had so much (or so little) effect. For example, Topic Leader jessiet recently asked:
I made a prediction just now of 10% and the new probability came down to 10%. That seems weird- that my one vote would count more than all past predictions? I assume it’s not related to the fact that I was the question author?
The quick answer is that she used Power mode, which is our market interface, and that’s how markets work: your estimate becomes the new consensus. Sound crazy? Note that markets beat out most other methods for the past three years of live geopolitical forecasting on the IARPA ACE competition. For two years, we ran one of those markets, before we switched to Science & Technology. So how can this possibly work? Read on for (a) How it works, (b) Why you should start with Safe mode, (c) The scoring rule underneath, and (d) An actual example.
A new SciCast ad campaign has created ~1,000 registrations per day for the past couple of days. That has doubled our forecaster community and created a lot of activity, which is great. But it also generated a lot of email notifications for users who had opted to receive updates for new comments, and more email is not always great.
After a dozen or so complaints and a review of some comments, we have disabled email notifications until we add some more controls.