WSDM2011 Tutorials

Tutorial Co-chairs

  • Qing Li, City University of Hong Kong, Hong Kong

Tutorial Information

Tang Room II, 3/F Ming Room I, 4/F
Morning Sessions

8:30 am to 12:00 pm
T-AM1
Crowdsourcing 101: Putting the "Wisdom of the Crowd" to Work for You.

Omar Alonso and Matthew Lease
T-AM2
Introduction to Display Advertising.

Andrei Broder, Vanja Josifovski and Jayavel Shanmugasundaram
Afternoon Sessions

1:30 pm to 5:00 pm
T-PM1
Exploiting Statistical and Relational Information on the Web and in Social Media.

Lise Getoor and Lilyana Mihalkova
T-PM2
Web Retrieval: The Role of Users.

Ricardo Baeza-Yates and Yoelle Maarek

T-AM1

Crowdsourcing has emerged in recent years as an exciting new avenue for leveraging the tremendous potential and resources of today’s digitally-connected, diverse, distributed workforce. Generally speaking, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an in-house employee or contractor. Crowdsourcing platforms such as Amazon Mechanical Turk and CrowdFlower have gained particular attention as active online market places for reaching and tapping into this glut of a still largely under-utilized workforce. Crowdsourcing offers intriguing new opportunities for accomplishing different kinds of tasks or achieving broader participation than previously possible, as well as completing standard tasks more accurately in less time and at lower cost. Unlocking the potential of crowdsourcing in practice, however, requires a tri-partite understanding of principles, platforms, and best practices. This tutorial will introduce the opportunities and challenges of crowdsourcing while discussing the three issues above. This will provide attendees with a basic foundation to begin applying crowdsourcing in the context of their own particular tasks.

T-AM2

Web advertising supports a large swath of the Internet ecosystem. It brings revenue to countless publishers that rent space on their pages for advertising: from small mom-and-pop shops to major search engines. It also providesvaluable traffic to numerous commercial Web sites and has fueled the development of Web search engines. Today, Web advertising is increasingly impacting the world outside the Internet by shaping the attitudes of numerous users. Computational advertising is a new scientific discipline that aims to formalize the problem of finding the best ad for a given user in a given context. In traditional advertising, the number of venues is small, the cost per venue is higher, and little or no personalization is possible (as for example in print magazines). In contrast, in online advertising there are billion of opportunities (page views), hundreds of millions of ads and it is possible to provide personalization with quantifiable results. This brings the advertising into the realm of the other ”computational” sciences. An overview of the current state of computational advertising can be found in http://msande239.stanford.edu/lectures/lecture-01.pdf. Display advertising is one of the two major advertising channels on the web (in addition to search advertising). Display advertising on the Web is usually done by graphical ads placed on the publishers’ Web pages. There is noexplicit user query, and the ad selection is performed based on the page where the ad is placed (contextual targeting) or user’s past activities (behavioral targeting). In both cases, sophisticated text analysis and learning algorithms are needed to provide relevant ads to the user.

Display advertising includes both a brand awareness component, where the aim of the advertiser is to promote awareness of a brand or a product, as well as a direct response component, where the aim of the advertiser is a click or conversion that leads to a visit to the advertiser’s Web site or other downstream economic activity. In addition, advertisers can also choose one of several payment types: CPM (Cost Per Mille — or 1000 — impressions/user visits), CPC (Cost Per Click), or CPA (Cost Per Action/Conversion, which may involve, for instance, filling out a form or an actual purchase). Dealing with multiple objectives and payment types again requires sophisticated learning algorithms to enable conversion and comparison between the payment types.

Finally, in display advertising, advertisers can choose to buy ads on a guaranteed basis many months in advance (these are typically CPM buys). For instance, an advertiser can request 100 million impressions during Superbowl 2011, and the publisher guarantees these visits ahead of time (even though the users have not actually shown up!). In essence, purchasing on a guaranteed basis is like purchasing goods on a futures market. Advertiser can also choose to buy on a non-guaranteed basis (these can be CPM, CPC or CPA buys), and in this case, they only pay for each impression, click or conversion. Many of the mechanisms required to support these forms of buying, such as traffic forecasting, ad selection, and pricing are just starting to attract the attention of the research community, and there is ample opportunity for impactful research in this area.

T-PM1

The popularity of Web 2.0, characterized by a proliferation of social media sites, and Web 3.0, with more richly semantically annotated objects and relationships, brings to light a variety of important prediction, ranking, and extraction tasks. The input to these tasks is often best seen as a (noisy) multi-relational graph, such as the graph of the Web itself; the click graph, defined by user interactions with Web sites; and the social graph, defined by friendships and affiliations on social media sites.

This tutorial will provide an overview of statistical relational learning and inference techniques, motivating and illustrating them using web and social media applications. We will start by briefly surveying some of the sources of statistical and relational information on the web and in social media and will then dedicate most of the tutorial time to an introduction to representations and techniques for learning and reasoning with multi-relational information, viewing them through the lens of web and social media domains. We will end with a discussion of current trends and related fields, such as privacy in social networks and probabilistic databases.

T-PM2

Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard document-centric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis techniques in both crawling and ranking challenges. A more recent, no less important but maybe more discrete step forward, has been to enter the user in this equation in two ways:  * Implicitly, through the analysis of usage data captured by query logs, and session and click information in general; the goal here being to improve ranking as well as to measure user’s happiness and engagement.  * Explicitly, by offering novel interactive features; the goal here being to better answer users’ needs. This half day tutorial will cover the user-related challenges associated with the implicit and explicit role of users in Web retrieval. More specifically, we will review and discuss challenges associated with two types of activities, namely: