PDF | IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz. Build watson: An overview of DeepQA for the Jeopardy! The DeepQA project ( ) is aimed at illustrating how the advancement and. @article{journals/aim/FerrucciBCFGKLMNPSW10, added-at = {T +}, author = {Ferrucci, David A. and Brown, Eric W. and.

Author: Yosho Tojagor
Country: Monaco
Language: English (Spanish)
Genre: Career
Published (Last): 6 April 2014
Pages: 375
PDF File Size: 10.60 Mb
ePub File Size: 4.34 Mb
ISBN: 721-1-45062-208-4
Downloads: 93292
Price: Free* [*Free Regsitration Required]
Uploader: Ketaxe

Skip to main content. Log In Sign Up. Kalyanpur, Adam Lally, J. William Murdock, Eric Nyberg, John Prager, Nico Schlaefer, and Chris Welty I IBM Research undertook a challenge to build T he goals of IBM Research are to advance computer oroject a computer system that could compete at the by exploring new ways for computer technology to affect human champion level in real time on the science, business, and society.

Building Watson: An Overview of the DeepQA Project. | BibSonomy

Roughly three years ago, American TV quiz show, Jeopardy. The Jeopardy Challenge chess-playing champion Hsuthat also would have clear helped us address requirements that led to the relevance to IBM business interests.

This is especially the case in cision, confidence, and speed at the Jeopardy the enterprise where popularity is not as important an indicator quiz show. Our results strongly suggest that of relevance and where recall can be as critical as precision. We believe advances in question-answering QA tech- nology can help support professionals in critical and timely decision making in areas like compliance, health care, business integrity, business intelligence, knowledge discovery, enterprise knowledge management, security, and ovdrview support.

The questions and retrieval, natural language ovrrview, knowledge content are ambiguous and noisy and none of the representation and reasoning, machine learning, individual algorithms are perfect.

Therefore, each and computer-human interfaces. This is roughly between 1 and 6 seconds ly a laboratory exercise. There is no more than 25 years see the Jeopardy! Quiz Show expectation that any component in the system sidebar for more information on the show. A computer system that could compete at aspects of the Jeopardy Challenge. While we believe the Jeopardy Challenge leading, for a computer.

Leveraging category information is playing chess. The Questions The Jeopardy Challenge There are a wide variety of ways one can attempt to Meeting the Jeopardy Challenge requires advancing characterize the Jeopardy clues. Quiz Show The Jeopardy! It features rich natural ceepqa ques- shown sn relieve the symptoms of ADD with rela- tions covering a broad range of general knowl- tively few side buildign.

The illustration shows a sample board for a allowing the other players to buzz in. In the second round, the dollar values it important for players to know what they know are doubled. For example the player may select by selected by a player. In addition, before the clue is revealed player is equipped with a hand-held signaling but- the player must wager a portion of his or her earn- ton.

If a player signals before the else ah lose it. First, a catego- again. That is, the player total earnings. Then the clue is revealed. Pfoject must answer the question, but the response must have 30 seconds to respond.


At the end of the 30 be in the form of a question. Jeopardy clues are straightforward assertional forms of questions. The questions each question as possible interpretations. Jeopardy also has categories of questions what exactly is being asked for and which elements that require special processing defined by the cate- of the clue are relevant in determining the answer. Some of them recur often enough that Here are deepaq a few examples note that while the contestants know what they mean without Jeopardy!

Quiz out what the puzzle is as the clues and answers are Show sidebarthis transformation is trivial and for revealed categories requiring explanation by the purposes of this paper we will just show the host are not part of the challenge.

Examples of answers ovedview General Science After category, where two subclues have answers Clue: When hit by electrons, a phosphor gives off that overlap by typically one word, and the electromagnetic energy in this ubilding. Rhyme Time category, where the two subclue Answer: Light or Photons answers must rhyme deepq one another.

Lincoln Blogs these cases also require question decomposition.

Secretary Chase just submitted this to me for For example: Before and After Goes to the Movies Answer: Film of a typical day in the life of the Tue, which includes running from bloodthirsty zombie Category: Head North fans in dewpqa Romero classic. Film of a typical day in the life of the Bea- Answer: Georgia and Alabama tles.

Some more complex clues con- Subclue 2: Running from bloodthirsty zombie fans tain multiple facts about the answer, all of which in a Romero classic. Night of the Living Dead are unlikely to occur together in one place. This archaic term for a mischievous or annoy- Subclue 1: Pele ball soccer ing child can also mean a rogue or scamp.

This archaic term for a mischievous or so on annoying child. This term can also mean a rogue or There are many infrequent types of puzzle cate- scamp. The Jeopardy quiz show the subclue can be replaced with its answer to form ordinarily admits two kinds of questions that IBM a new question that can more easily be answered.

Diplomatic Relations questions and Special Instructions questions.

The four countries in the world that mine a correct answer. Contestants are shown a picture of a B bomber Outer subclue: Lexical Answer Type Frequency.

Special instruction questions are those that are answer must be inferred by the context.

Decorating be interpreted and solved. Decode the Postal Codes dery, often in a floral pattern, done with yarn on Verbal instruction awtson host: The distribution of LATs has a very long tail, as Clue: We found distinct and Answer: Virginia and Indiana explicit LATs in the 20, question sample.

The Both present very interesting challenges from an most frequent explicit LATs cover less than 50 AI perspective but were put out of scope for this percent of the data. Figure 1 shows the relative fre- contest and evaluation. Chess resources and as-is structured knowledge rather Clue: In these cases the type of and betting strategy.

Precision Versus Percentage Attempted. Perfect confidence estimation upper line and no confidence estimation lower line. Percent answered whether or not Watson can win one or two games is the percentage of questions it chooses to answer against top-ranked humans in real time. The high- pverview or incorrectly. The bjilding chooses est amount of money earned by the end of a one- which questions to answer based on an estimated or two-game match determines the winner.


The threshold controls the task. This is because a player may decide to bet big trade-off between precision and percent answered, on Daily Double or Final Jeopardy questions.

For lower thresholds, it will be more aggres- projct that all players must gamble on. Accuracy refers to the precision if all events where players may risk all their current questions are answered. While potentially compelling for a pub- Figure 2 shows a plot of precision versus percent lic contest, a small number of games does not rep- attempted curves for two theoretical systems. Both systems have While Watson is equipped with betting strate- 40 percent accuracy, meaning they get 40 percent gies necessary for playing full Jeopardy, from a core of all questions correct.

Champion Human Performance fo Jeopardy. A further distinc- a system can deliver far higher precision even with tion is that in these historical games the human overviw same overall accuracy. Rather the percent The Competition: Figure 3 contains a graph that illustrates account competition overviw the buzz. Each point on the graph represents the per- ance will be affected by competition for the buzz formance of the winner in one Jeopardy game.

As between 85 percent and watsson percent precision.

Ken Jennings had an unequaled curveit delivered 47 percent precision, and over winning streak inin which he won 74 games all the clues in the set right end of the curveits in a row. Based on our analysis of those games, he precision was 13 percent. Human ments of the Jeopardy Challenge. The Baseline Performance framework is based on the Ephyra system, which Our metrics and baselines are intended to give us was designed for answering TREC questions.

Devel- systems to Jeopardy would improve their perform- oped in part under the U. The light gray line shows the per- using local resources.

Building Watson: An Overview of the DeepQA Project

A requirement of the Jeopardy formance of a system based purely on text search, Challenge is that the system be self-contained and using terms in the question as queries and search does not link to live web search.

The were different than for the Jeopardy challenge. Further- tifying and integrating relevant content. The and had a week to produce results for ques- search-based system has better performance at tions.

We devoted many months state of the art in QA Ferrucci et al Our investigations ran the gamut from wateon observed that system-level advances allowing rap- logical form analysis to shallow machine-transla- id integration and evaluation of new ideas and tion-based approaches. We integrated them into new components against end-to-end metrics were the standard QA pipeline that went from question essential to our progress.