Chapter 8 : How do search engines work?
  

- Internet search engines are web search engines that search and  retrieve information on the web. Most of them use crawler indexer  architecture. They depend on their crawler modules. Crawlers also  referred to as spiders are small programs that browse the web.
- Crawlers are given an initial set of URLs whose pages they retrieve.  They extract the URLs that appear on the crawled pages and give this  information to the crawler control module. The crawler module decides  which pages to visit next and gives their URLs back to the crawlers.
- The topics covered by different search engines vary according to the  algorithms they use. Some search engines are programmed to search sites  on a particular topic while the crawlers in others may be visiting as  many sites as possible.
- The crawl control module may use the link graph of a previous crawl or may use usage patterns to help in its crawling strategy.
- The indexer module extracts the words form each page it visits and  records its URLs. It results into a large lookup table that gives a list  of URLs pointing to pages where each word occurs. The table lists those  pages, which were covered in the crawling process.
- A collection analysis module is another important part of the search  engine architecture. It creates a utility index. A utility index may  provide access to pages of a given length or pages containing a certain  number of pictures on them.
- During the process of crawling and indexing, a search engine stores  the pages it retrieves. They are temporarily stored in a page  repository. Search engines maintain a cache of pages they visit so that  retrieval of already visited pages expedites.
- The query module of a search engine receives search requests form  users in the form of keywords. The ranking module sorts the results.
- The crawler indexer architecture has many variants. It is modified  in the distributed architecture of a search engine. These search engine  architectures consist of gatherers and brokers. Gatherers collect  indexing information from web servers while the brokers give the  indexing mechanism and the query interface. Brokers update indices on  the basis of information received from gatherers and other brokers. They  can filter information. Many search engines of today use this type of  architecture.
Example of 5 engines  on the Internet 
 
 
          
      
 
  
 
 
 
 
 
 
 
 
 
 
ไม่มีความคิดเห็น:
แสดงความคิดเห็น