Web search basics in Software Build barcode 39 in Software Web search basics

How to generate, print barcode using .NET, Java sdk library control with example project source code free download:
Web search basics using software todisplay barcode 3/9 with web,windows application Android innocuous barcode 39 for None features has contributed enormously to the growth of the Web, so it is worthwhile to examine them further. The basic operation is as follows: A client (such as a browser) sends an URL http request to a web server. The browser speci es a URL (for universal resource locator) such as http://www.

In this example URL, the string http refers to the protocol to be used for transmitting the data. The string www.stanford.

edu is known as the domain and speci es the root of a hierarchy of web pages (typically mirroring a lesystem hierarchy underlying the web server). In this example, /home/atoz/contact.html is a path in this hierarchy with a le contact.

html that contains the information to be returned by the web server at in response to this request.

The HTML-encoded le contact.html holds the hyperlinks and the content (in this instance, contact information for Stanford University), as well as formatting rules for rendering this content in a browser. Such an http request thus allows us to fetch the content of a page, something that will prove to be useful to us for crawling and indexing documents ( 20).

The designers of the rst browsers made it easy to view the HTML markup tags on the content of a URL. This simple convenience allowed new users to create their own HTML content without extensive training or experience; rather, they learned from example content that they liked. As they did so, a second feature of browsers supported the rapid proliferation of web content creation and usage: Browsers ignored what they did not understand.

This did not, as one might fear, lead to the creation of numerous incompatible dialects of HTML. What it did promote was amateur content creators who could freely experiment with and learn from their newly created web pages without fear that a simple syntax error would bring the system down. Publishing on the Web became a mass activity that was not limited to a few trained programmers, but rather open to tens and eventually hundreds of millions of individuals.

For most users and for most information needs, the Web quickly became the best way to supply and consume information on everything from rare ailments to subway schedules. The mass publishing of information on the Web is essentially useless unless this wealth of information can be discovered and consumed by other users. Early attempts at making web information discoverable fell into two broad categories: (i) full-text index search engines such as Altavista, Excite, and Infoseek and (ii) taxonomies populated with web pages in categories, such as Yahoo! The former presented the user with a keyword search interface supported by inverted indexes and ranking mechanisms building on those introduced in earlier chapters.

The latter allowed the user to browse through a hierarchical tree of category labels. Although this is at rst blush a convenient and intuitive metaphor for nding web pages, it has a number of drawbacks: First, accurately classifying web pages into taxonomy tree nodes is for the most part a manual editorial process, which is dif cult to scale with the size of the Web. Arguably, we only need to have high-quality web.

P1: KRU/IRP irboo k CUUS232/Manning 978 0 521 86571 5 May 27, 2008 16:11. 19.2 Web characteristics pages in the taxo nomy, with only the best web pages for each category. However, just discovering these and classifying them accurately and consistently into the taxonomy entails signi cant human effort. Furthermore, for a user to effectively discover web pages classi ed into the nodes of the taxonomy tree, the user s idea of what subtree(s) to seek for a particular topic should match that of the editors performing the classi cation.

This quickly becomes challenging as the size of the taxonomy grows; the Yahoo! taxonomy tree surpassed 1,000 distinct nodes fairly early on. Given these challenges, the popularity of taxonomies declined over time, even though variants (such as and the Open Directory Project) sprang up with subject matter experts collecting and annotating web pages for each category.

The rst generation of web search engines transported classical search techniques such as those in the preceding chapters to the web domain, focusing on the challenge of scale. The earliest web search engines had to contend with indexes containing tens of millions of documents, which was a few orders of magnitude larger than any prior information retrieval (IR) system in the public domain. Indexing, query serving, and ranking at this scale required the harnessing together of tens of machines to create highly available systems, again at scales not witnessed hitherto in a consumer-facing search application.

The rst generation of web search engines was largely successful at solving these challenges while continually indexing a signi cant fraction of the Web, all the while serving queries with subsecond response times. However, the quality and relevance of web search results left much to be desired owing to the idiosyncrasies of content creation on the Web that we discuss in Section 19.2.

This necessitated the invention of new ranking and spam- ghting techniques to ensure the quality of the search results. Although classical IR techniques (such as those covered earlier in this book) continue to be necessary for web search, they are not by any means suf cient. A key aspect (developed further in 21) is that whereas classical techniques measure the relevance of a document to a query, there remains a need to gauge the authoritativeness of a document based on cues such as which website hosts it.

Copyright © . All rights reserved.