CALL FOR PARTICIPATION: Registration due Jan 20, 2010

One Click Access (“1CLICK”) – A Subtask of the NTCIR-9 Intent Task

Last updated: December 10, 2010 by Tetsuya Sakai

 

Outline:

The goal of this task is to realize the following scenario:

the user enters a query and clicks on the search button -

and his/her information need is immediately satisfied with the

first system output that's displayed. No need to click any further -

hence "once click access".

 

Thus we step out of the "ranked list" and "document relevance" paradigm,

and enable IR, QA and summarization communities to solve a common and

important problem.

 

As this is Round 1 of Once Click Access (“1CLICK”),

only Japanese textual output of length up to X characters will be considered.

The textual output may represent a Search Engine Result Page

(i.e. titles and snippets for a list of retrieved documents),

part of a single document, or any fragments of text.

We focus on several typical types of queries (see Input).

 

Future rounds may also look into evaluating multimedia/multi-component output,

layout within a fixed-size display area, and so on [1].

 

Language Scope:

Since this is only Round 1, we will focus on Japanese monolingual information access

(Japanese query input – Japanese text output).

 

Language Resources:

Search queries will be mined manually from Japanese mobile query logs and

Yahoo! Chiebukuro (Japanese Yahoo! Answers) [2] data.

 

Participating systems are allowed to use ANY existing web pages

as the knowledge source for producing the system output.

That is, unlike typical information access tasks in evaluation workshops,

we do not specify a particular document collection from which

the responses need to be drawn.

 

Input:

We will release at least 60 Japanese queries covering four query types (CELEBRITY, LOCAL, DEFINITION and QA)

as input to the system.  The query type tag for each query will not be disclosed to participants until the run submission

deadline. However, depending on each question type, we expect the system to return

certain types of factual information. Details are given in the nugget creation policy document

and the sample queries and nuggets document.

In addition to the actual query strings, participating systems are allowed to use

all information contained in these two documents.

 

Note that the CELEBRITY, LOCAL and DEFINITION queries are all phrasal queries,

while the QA queries are a little more complex natural language queries.

 

We will release a query set file,  with each line containing the following two fields:

<queryID> <querystring>

 

Output:

Each participating system must return, for each query, a text string designed to be displayed in an X-character (not X-byte)

output screen. For convenience we call the output string “X-string”.

In addition, an unordered set of URLs that were used for producing the X-string must be

submitted for each query. Hence, the X-string may be viewed as a multi-document summary of the contents of these URLs.

Participants should try to minimize the amount of text the user has to read

in order to find what he was looking for, by putting highly relevant information near the beginning of the X-string

rather than near the end.

 

We accept two types of “runs” (system output files):

 

DESKTOP runs (“D-runs”) where the first X=500 characters of each X-string will be evaluated for each query.

This is designed to approximate the top 5 results in a Web search result page (with titles and snippets)

which the user can typically see without scrolling the browser.

 

MOBILE runs (“M-runs”) where the first X=140 characters of each X-string will be evaluated for each query.

This is designed to approximate a search output on a mobile phone display.

(Incidentally, X=140 is the size of one “tweet.”)

 

When counting the number of characters, we will ignore white spaces, punctuation marks and special symbols.

Hence your output may actually contain more than X characters. Our nugget match evaluation

interface will automatically truncate your X-strings before evaluation.

 

We expect participants to generate their runs completely automatically.

Please do not manually tune the system after the query set release.

 

Submission Format:

 

Each participating team can submit up to two D-runs plus two M-runs.

 

The file name of each run must be in the following format:

<teamID>-<runtype>-<priority>.txt

where

<teamID> is your teamID (which must not contain “-“ or “/”);

<runtype> is either D (for DESKTOP) or M (for MOBILE);

<priority> is either 1 or 2, with 1 representing the highest priority for assessment.

For example, a run from the MSRA team might be called “MSRA-D-1.txt”.

 

As the run files should contain Japanese characters, they must be encoded in UTF-8.

 

Each run file must contain one System Description line, and

an Output line with at least one but no more than 10 URL lines for each query,

as describe below.

 

Line 1 of a run file should contain a very brief system/algorithm description of the run,

and should be in the following format:

SYSDESC[tab]<brief one-sentence system description in English>

where TAB is used as the separator.

Please make sure the description does not include a newline code.

 

The run file should also contain 60 Output lines, each corresponding to a formal run query.

An Output line must be in the following format:

<QueryID>[TAB]OUT[TAB]<X-string>

where, as was mentioned earlier, <X-string> should be designed for a 500-character window

for a D-run and a 140-character window for an M-run.

Please make sure the X-string does not include a newline code.

 

Additionally, the run file must contain, for each query, 1-10 URL lines.

We regard this set of URLs as the knowledge source based on which the system produced the X-string.

A URL line must be in the following format:

<QueryID>[TAB]URL[TAB]<url>

These URLs will be used primarily for investigating what kinds of knowledge

source the participants have used.

 

To sum up, the contents of a run file should look something like this:

SYSDESC          Bing API used for obtaining top 10 snippets; tfidf used for sentence selection

1C1-0001         OUT     これは例です

1C1-0001         URL      http://www.thuir.org/1click/ntcir9/

1C1-0001         URL      http://research.nii.ac.jp/ntcir/index-en.html

   :                     :              :

1C1-0060         OUT     これも例です

1C1-0060         URL      http://research.nii.ac.jp/ntcir/index-en.html

 

Evaluation Methods:

We plan to design a nugget-based evaluation method [3,4,5].

As mentioned in the nugget creation policy document,

we will try to make the nuggets as factual and time-insensitive as possible.

Note, in particular, that facts that occurred after December 31, 2010 will be ignored in the evaluation.

That is, we are looking for established facts, not breaking news.

 

To evaluate an X-string, we will provide a web-based nugget match evaluation interface for

comparing it with a list of nuggets. Thus nugget matches will be identified manually.

 

Unlike traditional summarization and question answering evaluation,

we plan to utilize the position of each matched nugget for computing evaluation metrics.

That is, a nugget found near the end of the X-string will be discounted compared to

the same one found near the beginning of another X-string.

Redundancy will be penalized.

The goal is to cover the most relevant nuggets within an X-character window,

AND order them so as to minimize the amount of text the user has to read.

 

Please note that participants are expected to participate in the X-string evaluation process,

as shown below. However, for participating teams without a Japanese native speaker,

this burden shall be waived.

 

Tentative Schedule:

red=participants; blue=organizers; black=all

 

Oct 2010                     queries and nuggets constructed

Nov 2010                     first call for participation

Dec 2010                     nuggets graded by multiple assessors

Jan 20, 2010                NTCIR-9 task participant registration deadline

                                    (But we recommend registration by Dec 20 as

The NTCIR office will be closed Dec 23 – Jan 4.)

May 2011                    queries released/run submission

                                    (runs must be submitted within one week after the query set release)

Jun-Jul 2011                organizers and participants cross-assess runs using a web-based tool

                                    (for a given X-string, identify which nuggets are present in as well as where they are)

Aug 2011                     first evaluation results released/rebuttal period

                                    (report nuggets that should be added to the test collection – if necessary)

Sep 2011                     nugget revision – if necessary

Oct 2011                     final evaluation results released – if necessary

Nov 2011                     camera-ready papers due

                                    (participants must write a separate 1CLICK paper even if they also participate in

other INTENT subtasks)

Dec 6-9 2011               NTCIR-9 final meeting

 

To register as a participating team for NTCIR-9 1CLICK:

Please register using the NTCIR-9 participant registration page.

Make sure you check the “One Click Access” box.

Note that 1CLICK is a subtask of the INTENT task – please consider

signing up with the other INTENT subtasks as well.

 

Organizers:

Tetsuya Sakai [tesakai@microsoft.com] (Microsoft Research Asia)

Makoto Kato [kato@dl.kuis.kyoto-u.ac.jp] (Kyoto University)

Youngin Song [yosong@microsoft.com] (Microsoft Research Asia)

Ruihua Song [rsong@microsoft.com] (Microsoft Research Asia)

Min Zhang [z-m@tsinghua.edu.cn] (Tsinghua University)

Yiqun Liu [yiqunliu@tsinghua.edu.cn] (Tsinghua University)

Nick Craswell [nickcr@microsoft.com] (Microsoft)

 

References:

[1] Bailey, P., Craswell, N., White, R., Chen, L., Satyanarayana, A. and Tahaghoghi, S.M.M.:

Eavluating Whole-Page Relevance,

ACM SIGIR 2010 Proceedings, 2010.

[2] Ishikawa, D., Sakai, T. and Kando, N.:

Overview of the NTCIR-8 Community QA Pilot Task (Part I): The Test Collection and the Task,

NTCIR-8 Proceedings, June 2010. pdf

[3] Nenkova, A., Passonneau, R. and McKeown, K.:

The Pyramid Method: Incorporating Human Content Selection Variation in Sumarization Evaluation,

ACM Transactions on Speech and Language Processing, Volume 4, Number 2, Article 4, 2007.

[4] Lin, J. and Demner-Fushman, D.:

Will Pyramids Built of Nuggets Topple Over?

Proceedings of the HLT Conference of the North American Chapter of the ACL, pp.383-390, 2006.

[5] Mitamura, T., Nyberg, E., Shima, H., Kato, T., Mori, T., Lin, C.-Y., Song, R., Lin, C.-J., Sakai, T., Ji. D. and Kando, N.:

Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access,

NTCIR-7 Proceedings, December 2008. pdf

 

Questions?:

Please contact Tetsuya Sakai (see above).