Nice summary by longtime colleague and arch argument mapper Tim van Gelder. “The pivotal element here obviously is Track, i.e. measure predictive accuracy using a proper scoring rule.” If “ACERA” sounds familiar, it’s because they were part of our team when we were DAGGRE: they ran several experiments on and in parallel to the site.
Our school had a candy guessing contest for Hallowe’en. There were three Jars of Unusual Shape, and various sizes.
The spirit of Francis Galton demanded that I look at the data. Candy guessing, like measuring temperature, is a classic case where averaging multiple readings from different sensors is expected to do very well. Was the crowd wise? Yes.
- The unweighted average beat:
- 67% of guessers on Jar 1
- 78% of guessers on Jar 2
- 97% of guessers on Jar 3, and
- 97% of guessers overall
- The median beat:
- 89% of guessers on Jar 1
- 83% of guessers on Jar 2
- 78% of guessers on Jar 3, and
- 97% of guessers overall
Only one person beat the unweighted average, and two others were close. There were 36 people who guessed all three jars (and one anonymous blogger who guessed only one jar and was excluded from the analysis). The top guesser had an overall error of 9%, while the unweighted average had an overall error of 11%. Two other guessers came close, with an average error of 12%. The worst guessers had overall error rates greater than 100%, with the worst being 193% too high.
The unweighted average was never the best on a single jar — though on Jar 3 it was only off by 1. (The guesser on Jar 3 was exactly correct.)
The measure I used was the overall Average Absolute %Error. The individual rankings change slightly If instead we use Absolute Average %Error, but the main result holds.
In time for Hallowe’en, we’ve added Shadow Forecasts and other features to help show the awesome power of combo.
- Shadowy: when linked forecasts affect a question, we show “Shadow Forecasts” in the history and the trend graph.
- Pointy: the trend graph now shows one point per forecast instead of just the nightly snapshot.
- Chatty: Comment while you forecast. (But not while you drive.)
Read on for news about upcoming prizes.
Tonight’s release streamlines combo trades, adds some per-question rank feedback, prettifies resolutions, and disables recurring edits.
We’ve redone the approach to trading linked questions. Now if the question is linked to other questions, you can make any desired assumptions right from the main trade screen.
The following shows an example of a Scaled or Continuous question:
Instead of estimating the chance of a particular outcome, you are asked to forecast the outcome in natural units like $. Forecasts moving the estimate towards the actual outcome will be rewarded. Those moving it away will be penalized. As with probability questions, moving toward the extremes is progressively more expensive: we have merely rescaled the usual 0%-100% range and customized the interface.
Forecasters frequently want to know why their forecast had so much (or so little) effect. For example, Topic Leader jessiet recently asked:
I made a prediction just now of 10% and the new probability came down to 10%. That seems weird- that my one vote would count more than all past predictions? I assume it’s not related to the fact that I was the question author?
The quick answer is that she used Power mode, which is our market interface, and that’s how markets work: your estimate becomes the new consensus. Sound crazy? Note that markets beat out most other methods for the past three years of live geopolitical forecasting on the IARPA ACE competition. For two years, we ran one of those markets, before we switched to Science & Technology. So how can this possibly work? Read on for (a) How it works, (b) Why you should start with Safe mode, (c) The scoring rule underneath, and (d) An actual example.
A new SciCast ad campaign has created ~1,000 registrations per day for the past couple of days. That has doubled our forecaster community and created a lot of activity, which is great. But it also generated a lot of email notifications for users who had opted to receive updates for new comments, and more email is not always great.
After a dozen or so complaints and a review of some comments, we have disabled email notifications until we add some more controls.
** Updated 18-May **
Some videos have been added today (refresh https://scicast.org/dicty), so I took a second look. I don’t look through microscopes for a living, so this isn’t final. Times below are the timestamped clock time, which elapses at about 8m per frame. I collected:
- TeamNum: Team
namenumber stated in the video title
- TeamName: Team name and/or PI name(s)
- t_First: time when the first cell entered the finish zone. There is some ambiguity for cells that hang out on the opening.
- Num_3hr: #cells in the finish zone by the 3hr mark; not sure how Irimia is counting cells that leave after entering
- Num_5hr: #cells in the finish zone by the 5hr mark (only 4 videos)
- Notes: salient notes, in quotes
Often there were two mazes and finish lines visible. In these cases I tried to count both finish lines. I’m not sure that’s canonical. Here are the results as I saw them (Changed format to avoid margin clipping; sorted by Num_3hr):
Team 12,Kortholt,1:36,14,,”3 in at 1:36,4 @ 1:40,at least 14 at 3h”
Team 11,Faix,1:28,12,,3 in @ 1:36; I counted at least 12 entering by 3h
Control HL60,,1:42,9,,9+. I can’t get it to hold the last frame to count.
Team 9 HL60,Insall,2:14,5,,5+. Many cells got bored and went back into the maze.
Team 17,Kimmel,1:36,5,,5+. Can’t hold the last frame.
Caffeine Control,,1:56,4,,Last one just in time.
Race2Control,,2:15,2,5,”Lots of cells,lots of movement,esp. outside the maze.”
Racer X Neutrophils,,2:25,2,,Cell entering 2:25 jumps track next frame 2:30. Entered?
Team1,Strassman & Queller,2:38,1.5,3,Video starts at 1:30; 2nd cell just crossing at 3:00
Team 9,Insall,,0,0,”Video starts at 1:30; hard to see,but cells appear disorganized.”
Team18,Kay,4:07,0,1,Cells outside the maze are much faster!
Team5 HL60,Steinckwich-Besancon,,0,,”Only 5-6 cells visible,hardly moving.”
Team 19 HL60,Tschirhart,,0,,17 cells idly drifting around the mazes.
Team 7,Bruce,,0,,Halfway through the maze at 3h.
Team 10,Beta,,0,,No cells visible on the slide.
Team 20,Devreotes,,0,,Only 1 confused cell. Did preliminary videos have the wrong label?
Team 9,Insall,,0,,About halfway through at 3h.
Team 15,Myre,,0,,About halfway through at 3h.
Team14,Muller-Taubenberger,,,,Irimia says cells did not get to the starting line.
So, likely winners:
- Overall winner (most finishers at 3h): Team 12 (Kortholt) with 14+ cells. We had an email from Irimia mid-afternoon saying at that point they also thought 12 was the most likely. Second place is probably Team 11 (Faix) with 12+ cells in. Third is likely to be either the HL60 line from Team 9 (Insall), or Team 17 (Kimmel), both with at least 5 cells in by 3hr.
- Fastest: Team 11 (Faix) just edges out Teams 12 (Kortholt) and 17 (Kimmel) for the fastest single cell at 1:28. This may turn on the precise definition of finish line.
- Smartest: Sorry, I don’t have the patience to count wrong turns.
Notes: Neutrophils need to be photographed more often — the fastest ones are simply sprinting too fast to uniquely infer motion between frames.
You can still edit the Fastest and Smartest questions. Until Irimia announces the winners anyway. https://scicast.org/dicty.
On SciCast, we’ve posted three questions about the missing plane. Can crowdsourcing help to locate it?
Dr. Charles Twardy, Project Principal, explains the different ways to crowdsource a search. “When a community turns out to help look for a lost child, that’s crowdsourcing,” he says. “The community volunteers typically aren’t as well-prepared as the search teams, but when directed by experienced Field Team Leaders, they can greatly extend the search effort. Similarly, experimental micro-tasking sites like TomNod.com let volunteers help search piles of digital images. Call it the effort of the crowd. SciCast is about the wisdom of the crowd: weighing the vast amounts of uncertain and conflicting evidence to arrive at a group judgment, of say the relative chances of several regions or scenarios. This could be as simple as an average – a robust method with much to recommend it when judgments are independent. Or it could be something more advanced, like SciCast’s combinatorial prediction market. A market reduces double-counting, and may be better suited to the case where most of us are just mulling over the same information, but a few have real insight. The trick is to find a large and diverse crowd, and persuade them to participate.”
Following are the questions. Click any of them to make your forecast (register or login first). Also, see the discussion and background tabs of each question for more details and links to news sources.
The extended search region uses this map.
See this blog post for info on how to explore conditional probabilities.
Click here to read more about approaches to crowdsourcing Search & Rescue.