NTCIR-9 Intent Task Guidelines

Ruihua Song1, Tetsuya Sakai1, Min Zhang2, Yiqun Liu2, Makoto Kato3, Young-In Song1, Nick Craswell4

1Microsoft Research Asia, 2Tsinghua University

3Kyoto University, 4Microsoft Research

 

Welcome to the NTCIR-9 Intent Task. Our goal is to explore and evaluate the technologies of mining and satisfying different user intents for a vague query, or immediately satisfying a user¡¯s information need by the first system output. This document mainly covers the part for vague queries. If you are interested in the second part, please visit One Click Access homepage.

Tentative Timetable

Corpus available: now

Call for participants: Nov 2010

NTCIR-9 task participant registration: Jan 20, 2010

Example topics available: Jan 6, 2011
(See the Released Data section. Please download the topics again as we update query IDs in Apr 15, 2011)

Evaluation tools and judgments for Chinese example topics available: Apr 15, 2011

Judgments for Japanese example topics available: Apr 29, 2011
(See the Released Data section. Please note the format of runs is changed. See the Submission Format section.)

Topics (distributed by emails) available: May 1, 2011 (Chinese tasks)  and May 17, 2011 (Japanese tasks)

A submission site is launched: May 31, 2011

CheckIntent.pl is upgraded: June 22, 2011 (See the Released Data section. Please download it again.)

Submissions due (extended): 5:00 pm June 10, 2011 (Beijing Time) for Chinese tasks and 5:00 pm June 24, 2011 (Beijing Time) for Japanese tasks

Evaluation results available: Aug 22nd 2011

Paper draft due: Sep 20th 2011

NTCIR-9 conference: Dec 6th-9th 2011, Tokyo, Japan

 

To register as a participating team for NTCIR-9 INTENT:

Please register using the NTCIR-9 participant registration page.

Overview

Many web queries are short and vague. By submitting one query, users may have different intents. For an ambiguous query, users may seek for different interpretations. For a query on a broad topic, users may be interested in different subtopics. Today mining users¡¯ underlying intents of a query is an interesting topic for both IR communities and commercial search engines. Therefore, we propose this Intent task to provide common data sets and evaluation methodology to researchers.

 

In NTCIR-9, we explore the following two problems:

1)      How to mine underlying intents/subtopics;

2)      How to selectively diversify search results.

 

Accordingly, the intent task consists of two subtasks: subtopic mining and document ranking. Both Chinese and Japanese datasets are available for evaluating technologies. For each language, 100 topics will be released in the subtopic mining subtask. 50 of them will be selected and evaluated in the document ranking subtask. All runs should be generated completely automatically. No manual run is allowed. The judgments of subtopics will be used for evaluating diversity of document ranking results. If a topic tends to be single-intent, the diversified result may lower retrieval effectiveness. See below for details and examples.

Document Collection

The task will use SogouT as its document collection for Chinese topics. The collection contains about 130M Chinese pages together with the corresponding link graph. The size is roughly 5TB uncompressed. The data was crawled and released on Nov 2008. Further information regarding this collection can be found on the page http://www.sogou.com/labs/dl/t-e.html. You can also directly contact chenjing@sogou-inc.com to obtain the data set.

 

The task will use ClueWeb09-JA as its document collection for Japanese topics. The ClueWeb09-JA collection is composed of all the 67M Japanese pages in the ClueWeb09 collection. We appreciate Prof. Jamie Callan and his team providing the ClueWeb09-JA collection, which dramatically reduces the cost of participants. The data was crawled during January and February 2009. Further information regarding the collections can be found on the page http://boston.lti.cs.cmu.edu/Data/clueweb09/.

Query log

For Chinese topics, the task will use the SogouQ log data as additional resources. The data contains one-month queries and click-through collected by the commercial search engine Sogou in China market. The log data was collected in 2008 and consistent with SogouT. Further information regarding the data can be found on the page http://www.sogou.com/labs/dl/q-e.html.

Subtopic Mining Subtask

It is important to understand possible intents for a query before we use them in different search scenarios. Result diversification is one scenario that aims to improve the satisfaction of users with different intents. Query suggestion is another scenario that helps a user to drill down to a subtopic of the original query. In this task, a subtopic could be an interpretation of an ambiguous query or an aspect of a faceted query. Participants are encouraged to use the document collections and the query log to automatically mine the underlying intents/subtopics. If other resources are used, please specify the resources in submission forms.

 

For example, ¡°windows¡± may refer to Microsoft Windows software or house windows. In the category of Microsoft Windows, users may be interested in different aspects, such as ¡°Windows 7¡±, ¡°Windows update¡±, etc. In this task, different interpretations and aspects are called subtopics. Given a query, participants are encouraged to return a ranked list of subtopics that include popular and diverse subtopics.

 

In subtopic mining task, we will have the submitted subtopics grouped as intents. Then a large number of assessors will vote for intents. We will estimate the probabilities of intents based on the votes. Finally, we will evaluate submissions by several different metrics, including D#-measures proposed in Sakai/Song SIGIR2011 paper (preprints available on request.) 

Document Ranking Subtask

We encourage participants to selectively use diversification algorithms in ranking. We argue that diversification is not necessary for all queries. How to automatically identify ambiguous or faceted queries is an interesting problem. Based on the subtopic mining results, we will select different types of queries, including those covering diverse subtopics and those focusing on a specific topic, and mix them to form a topic set in this subtask. Ranking experiments are expected to run on common web data collections.

 

The goals of diversification are (a) to retrieve documents that cover as many intents as possible; and (b) to rank documents that are highly relevant to more popular intents higher than those that are marginally relevant to less popular intents.

 

We will evaluate systems using several different metrics, including D#-measures proposed in Sakai/Song SIGIR2011 paper (preprints available on request.) Per-intent graded relevance is considered. The key subtopics obtained in the first subtask will be used as intent sets in evaluating diversity.

One Click Access Subtask

Please visit the page http://research.microsoft.com/en-us/people/tesakai/1click.aspx to learn more about this pilot task.

Submission Format

All subtopic mining and document ranking submissions must be compressed (zip). Each participating team can submit up to 5 runs to for each subtask and language pair. In a run of the subtopic mining task, please submit up to 100 subtopics per topic. In a run of the document ranking task, please submit up to 1000 documents per topic. We may use a cutoff, e.g. 10, in evaluation. As the run files should contain Chinese or Japanese characters, they must be encoded in UTF-8.

 

The file name of each run must be in the following format:

<teamID>-<subtask>-<language>-<priority>.txt

where

<teamID> is your teamID (which must not contain ¡°-¡° or ¡°/¡±);

<subtask> is either S (for Subtopic mining) or D (for Document ranking);

<language> is either C (for Chinese) or J (for Japanese);

<priority> is either 1, 2, 3, 4, or 5 with 1 representing the highest priority for assessment.

For example, a run from the Tsinghua University team might be called ¡°THU-S-C-1.txt¡±.

 

It is important to set priorities because we may not assess all runs from a participant due to limited resources. We will assess a run with higher priority first. For example, if we can only judge two runs from each participant, we will use the runs with priority 1 and 2.

 

The number of subtopics/documents per run and the number of runs from each participants that will be judged by assessors depend on resources. It will be fair for participants. The equal number of runs and the equal number of subtopics/documents per run will be collected from each participant in pooling.

 

Line 1 of a run file should contain a very brief system/algorithm description of the run, and should be in the following format:

<SYSDESC>brief system description</SYSDESC>

 

For subtopic mining subtask, a submission consists of a single text file encoded in UTF-8. Since the second line, each line is in the following format:

 

¡°[TopicID];0;[Subtopic];[Rank];[Score];[RunTag]\n¡±

 

Semicolon is used to separate columns. For example,

0001;0;Windows Phone 7;1;0.98;ExampleRun1

0001;0;Windows 7;2;0.97;ExampleRun1

0001;0;Windows Update;3;0.9;ExampleRun1

0001;0;House Windows;4;0.85;ExampleRun1

 

Please make it sure that a subtopic string does not contain any ";" or "\".

 

We found some problems in Chinese subtopic mining run files. We hope that participants can avoid them in the furture:

 

We have upgraded CheckIntent.pl to detect the first four problems. To check the format of a run file, please use "CheckIntent.pl <ResultFile>". If one of the four problems is detected, please use "CheckIntent.pl <ResultFile> <FixedFile>" to fix these kinds of problems. The script will generate a new result file, i.e. <FixedFile>, by removing bad character codes, unnecessary white space, and disallowed "\". When using this command, please ignore messages like: "Wide character in print at [path]\CheckIntent.pl line 133, <RES> line xxx."

Finally, please check the fixed file again by running "CheckIntent.pl <FixedFile>". There may be other issues that the script cannot fix.

 

CheckIntent.pl does not intentionally detect garbled text, but sometimes the problem occurs along with bad character codes. So participants may discover this problem when checking the lines with warning messages. Please note that the script cannot automatically correct the text visually garbled.

 

For document ranking subtask, a submission consists of a single text file encoded in UTF-8. Since the second line, each line is in the following format:

 

¡°[TopicID] 0 [DocumentID] [Rank] [Score] [RunTag]\n¡±

 

Space is used to separate columns. For example, a submission for Japanese document ranking subtask looks like this:

0101 0 clueweb09-ja0006-97-23810 1 27.73 ExampleRun2

0101 0 clueweb09-ja0009-08-98321 2 25.15 ExampleRun2

0101 0 clueweb09-ja0003-71-19833 3 21.89 ExampleRun2

0101 0 clueweb09-ja0002-66-03897 4 13.57 ExampleRun2

 

For the Chinese collection, document IDs are different.

Released Data

Please download the latest version of evaluation tool NTCIREVAL here: http://research.nii.ac.jp/ntcir/tools/ntcireval-en.html. The formats of files required for evaluation are described in README.diversity.

 

Please download topics, judgments, and intent probabilities files. Most of the files are encoded in UTF-8. We recommend using right click and ¡°Save Target As¡± to download files to your local disk. If left clicking files, you may see strange characters due to wrong encoding.

 

Files for ten example topics in Simplified Chinese:

Example topics (query only)

Example topics (full XML format)

Dqrels relevance judgments for the subtopic mining task (updated on May 10)

Iprob intent probabilities for the subtopic mining task

Dqrels relevance judgments for the document ranking task

Iprob intent probabilities for the document ranking task

 

Files for ten example topics in Japanese:

Example topics (query only)

Example topics (full XML format)

Dqrels relevance judgments for the subtopic mining task (updated on May 10)

Iprob intent probabilities for the subtopic mining task

Dqrels relevance judgments for the document ranking task

Iprob intent probabilities for the document ranking task

 

Please download a Perl script CheckIntent.zip (updated on August 12) to help check the format of formal runs. This script should be run with the CharsetDetector module developed by Qian Yu (http://search.cpan.org/~foolfish/Encode-Detect-CJK-2.0.2/lib/Encode/Detect/CJK.pm).

Contact Us

If any question, please send emails to ntcadm-intenthttp://research.nii.ac.jp/ntcir/gif/nii-mail-add.gif.

 

 

Last updated on August 12, 2011.