click spam the simple Goto scheme; the process entails a blending of ideas from IR and microeconomics, and is beyond the scope of this book. For advertisers, understanding how search engines do this ranking and how to allocate marketing campaign budgets to different keywords and to different sponsored search engines has become a profession known as search engine marketing (SEM). The inherently economic motives underlying sponsored search give rise to attempts by some participants to subvert the system to their advantage.

This can take many forms, one of which is known as click spam. There is currently no universally accepted de nition of click spam. It refers (as the name suggests) to clicks on sponsored search results that are not from bona de search users.

For instance, a devious advertiser may attempt to exhaust the advertising budget of a competitor by clicking repeatedly (through the use of a robotic click generator) on that competitor s sponsored search advertisements. Search engines face the challenge of discerning which of the clicks they observe are part of a pattern of click spam, to avoid charging their advertiser clients for such clicks. Exercise 19.

5 The Goto method ranked advertisements matching a query by bid: the highest-bidding advertiser got the top position, the second-highest the next, and so on. What can go wrong with this when the highestbidding advertiser places an advertisement that is irrelevant to the query Why might an advertiser with an irrelevant advertisement bid high in this manner Exercise 19.6 Suppose that, in addition to bids, we had for each advertiser their click-through rate, the ratio of the historical number of times users click on their advertisement to the number of times the advertisement was shown.

Suggest a modi cation of the Goto scheme that exploits this data to avoid the problem in Exercise 19.5 above..

P1: KRU/IRP irbook barcode 39 for None CUUS232/Manning 978 0 521 86571 5 May 27, 2008 16:11. 19.4 The search user experience 19.4 The search user experience It is crucial that we understand the users of web search as well. This is again a signi cant change from traditional IR, where users were typically professionals with at least some training in the art of phrasing queries over a wellauthored collection whose style and structure they understood well. In contrast, web search users tend to not know (or care) about the heterogeneity of web content, the syntax of query languages, and the art of phrasing queries; indeed, a mainstream tool (as web search has come to become) should not place such onerous demands on billions of people.

A range of studies has concluded that the average number of keywords in a web search is somewhere between two and three. Syntax operators (Boolean connectives, wildcards, etc.) are seldom used, again a result of the composition of the audience normal people, not information scientists.

It is clear that the more user traf c a web search engine can attract, the more revenue it stands to earn from sponsored search. How do search engines differentiate themselves and grow their traf c Here, Google identi ed two principles that helped it to grow at the expense of its competitors: (i) A focus on relevance, speci cally precision rather than recall in the rst few results; and (ii) a user experience that is lightweight, meaning that both the search query page and the search results page are uncluttered and almost entirely textual, with very few graphical elements. The effect of the rst was simply to save users time in locating the information they sought.

The effect of the second is to provide a user experience that is extremely responsive, or at any rate not bottlenecked by the time to load the search query or results page..
