Web Retrieval: The Role of Users
Ricardo Baeza-Yates, Yahoo! Research
Yoelle Maarek, Yahoo! Research
Time:
Venue:
Abstract
Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard document-centric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis techniques in both crawling and ranking challenges. A more recent, no less important but maybe more discrete step forward, has been to enter the user in this equation in two ways:
Implicitly, through the analysis of usage data captured by query logs, and session and click information in general; the goal here being to improve ranking as well as to measure user’s happiness and engagement.
Explicitly, by offering novel interactive features; the goal here being to better answer users’ needs.
This half day tutorial will cover the user-related challenges associated with the implicit and explicit role of users in Web retrieval. More specifically, we will review and discuss challenges associated with two types of activities, namely:
Usage data analysis and metrics - It is critical to monitor how users take advantage and interact with Web retrieval systems, as this implicit relevant feedback aggregated at a large scale, can approximate quite accurately the level of success of a given feature. Here we have to consider not only clicks statistics but also the time spent in a page, the number of actions per session, etc.
User interaction - Given the intrinsic problems posed by the Web, the key challenge for the user is to conceive a good query to be submitted to the search system, one that leads to a manageable and relevant answer. The retrieval system must complete search requests fast and give back relevant results, even for poorly formulated queries, as is the common case in the Web. Web retrieval engines thus interact with the user at two key stages:
Expressing a query: Human beings have needs or tasks to accomplish, which are frequently not easy to express as “queries”. Queries, even when expressed in a more natural manner, are just a reflection of human needs and are thus, by definition, imperfect. This phenomenon could be compared to Plato’s cave metaphor, where shadows are mistaken for reality.
Interpreting results: Even if the user is able to perfectly express a query, the answer might be split over thousands or millions of Web pages or not exist at all. In this context, numerous questions need to be addressed. Examples include: How do we handle a large answer? How do we rank results? How do we select the documents that really are of interest to the user? Even in the case of a single document candidate, the document itself could be large. How do we browse such documents efficiently?