About ctwardy

2015: Senior Data Scientist, Defense Suicide Prevention Office (via NTVI Federal), and Affiliate Professor, George Mason University. 2008-2015: Research Assistant Professor at George Mason University. Areas: Judgment & Decision-making; Machine learning; Statistical inference; Philosophy of Science; Collective Intellligence; Computational Philosophy.

Seven Guidelines for Better Forecasting

Nice summary by longtime colleague and arch argument mapper Tim van Gelder. “The pivotal element here obviously is Track, i.e. measure predictive accuracy using a proper scoring rule.” If “ACERA” sounds familiar, it’s because they were part of our team when we were DAGGRE: they ran several experiments on and in parallel to the site.

Tim van Gelder

“I come not to praise forecasters but to bury them.”  With these unsubtle words, Barry Ritholz opens an entertaining piece in the Washington Post, expressing a widely held view about forecasting in difficult domains such as geopolitics or financial markets.  The view is that nobody is any good at it, or if anyone is, they can’t be reliably identified.  This hard-line skepticism has seemed warranted by the persistent failure of active fund managers to statistically outperform dart-throwing monkeys, or the research by Philip Tetlock showing that geopolitical experts do scarcely better than random, and worse than the simplest statistical methods.

More recent research on a range of fronts – notably, by the Good Judgement Project, but also by less well-known groups such as Scicast and ACERA/CEBRA here at Melbourne University – has suggested that a better view is what might be termed “tempered optimism” about expert judgement forecasting. This new attitude acknowledges that forecasting challenges will always fall on…

View original post 443 more words

Advertisements

Candy Guessing

Our school had a candy guessing contest for Hallowe’en.  There were three Jars of Unusual Shape, and various sizes.

108 Candies

Jar 1: 108 Candies

141 Candies

Jar 2: 141 Candies

259 Candies

Jar 3: 259 Candies

The spirit of Francis Galton demanded that I look at the data.  Candy guessing, like measuring temperature, is a classic case where averaging multiple readings from different sensors is expected to do very well.  Was the crowd wise?  Yes.

  • The unweighted average beat:
    • 67% of guessers on Jar 1
    • 78% of guessers on Jar 2
    • 97% of guessers on Jar 3, and
    • 97% of guessers overall
  • The median beat:
    • 89% of guessers on Jar 1
    • 83% of guessers on Jar 2
    • 78% of guessers on Jar 3, and
    • 97% of guessers overall

Only one person beat the unweighted average, and two others were close. There were 36 people who guessed all three jars (and one anonymous blogger who guessed only one jar and was excluded from the analysis). The top guesser had an overall error of 9%, while the unweighted average had an overall error of 11%. Two other guessers came close, with an average error of 12%. The worst guessers had overall error rates greater than 100%, with the worst being 193% too high.

The unweighted average was never the best on a single jar — though on Jar 3 it was only off by 1.  (The guesser on Jar 3 was exactly correct.)

The measure I used was the overall Average Absolute %Error.  The individual rankings change slightly If instead we use Absolute Average %Error, but the main result holds.

Boo! Shadow forecasts on SciCast

In time for Hallowe’en, we’ve added Shadow Forecasts and other features to help show the awesome power of combo.

  1. Shadowy: when linked forecasts affect a question, we show “Shadow Forecasts” in the history and the trend graph.legend
  2. Pointy: the trend graph now shows one point per forecast instead of just the nightly snapshot.
  3. Chatty: Comment while you forecast.  (But not while you drive.)

Read on for news about upcoming prizes.

Continue reading

New Approach to Combo Forecasts…

Tonight’s release streamlines combo trades, adds some per-question rank feedback, prettifies resolutions, and disables recurring edits.

Screen Shot 2014-09-23 at 9.28.27 PM

We’ve redone the approach to trading linked questions. Now if the question is linked to other questions, you can make any desired assumptions right from the main trade screen.

Continue reading

Scaled Continuous Question

The following shows an example of a Scaled or Continuous question:

Image

Instead of estimating the chance of a particular outcome, you are asked to forecast the outcome in natural units like $.  Forecasts moving the estimate towards the actual outcome will be rewarded. Those moving it away will be penalized.  As with probability questions, moving toward the extremes is progressively more expensive: we have merely rescaled the usual 0%-100% range and customized the interface.

Continue reading

Why did my forecast do that?

Forecasters frequently want to know why their forecast had so much (or so little) effect. For example, Topic Leader jessiet recently asked:

I made a prediction just now of 10% and the new probability came down to 10%. That seems weird- that my one vote would count more than all past predictions? I assume it’s not related to the fact that I was the question author?

The quick answer is that she used Power mode, which is our market interface, and that’s how markets work: your estimate becomes the new consensus.  Sound crazy? Note that markets beat out most other methods for the past three years of live geopolitical forecasting on the IARPA ACE competition. For two years, we ran one of those markets, before we switched to Science & Technology.  So how can this possibly work?  Read on for (a) How it works, (b) Why you should start with Safe mode, (c) The scoring rule underneath, and (d) An actual example.

Continue reading

Comment notifications temporarily disabled

A new SciCast ad campaign has created ~1,000 registrations per day for the past couple of days.  That has doubled our forecaster community and created a lot of activity, which is great.  But it also generated a lot of email notifications for users who had opted to receive updates for new comments, and more email is not always great.

After a dozen or so complaints and a review of some comments, we have disabled email notifications until we add some more controls.

Continue reading