“You must learn to crawl before you can search” #SharePointProverbs
Search is one of the major USP’s of SharePoint which is robust and customizable. In SharePoint 2010 & MOSS 2007, FAST search comes as additional component but in SharePoint 2013, FAST is integrated to enterprise search by default. This blog explains the architecture and wiring of enterprise search in SharePoint.
ABC of a search:
Word “googol,” is a mathematical term for the number represented by the numeral 1 followed by 100 zeros, which defines the sole purpose of playing with infinite amount of information on the web. When you do a search, results are not directed from original data sources but from the indexes. Yes, search engine creates these indexes every day, every hour through a process called “crawl”. These indexes are like “pointers” or “spiders” which links to the available contents. Finally suitable contents are listed based on “ranking algorithms” in the search engine. Google search engine have crawled around 30 trillion pages on the web to store 100 Million GB of search index and it considers 200 metadata properties in their ranking algorithms.
Enterprise search in SharePoint:
SharePoint enterprise search process is analogous to Google search.
Crawler connects to data sources such as file shares, FTP locations, content db’s, websites and crawls these contents based on the crawl schedule. As you imagine full crawl is resource extensive as it re builds the whole index again and incremental crawl just updates the changes the the index.
These activities are performed by Index server or Indexing engine and stored in the file system. These files consists of all crawl properties and metadata values which are processed by search engine to get the final results.
Query Engine translates the search query resulted from any user interface such as search center or SharePoint API’s in to standard SQL-92 and SQL-99 database query syntax formats to retrieve the results from the index files. Before this an important activity called “Index propagation” sends a local copy of index files to the query server such that each search request will be routed locally instead of going across the network to query the index server to improve the response time. (Major difference between MOSS and SharePoint 2010 is where multiple copies of indexes are stored across index and query servers in MOSS while temporary index files from Index server are moved simultaneously to Query server in SharePoint 2010).
Index partitions provides a solution to create index out of a various content sources divided among different Query Servers. On top of this, these index partitions can be hosted on multiple query servers as a copy called “Secondary Index” so that even if one query server is dead, entire index is not lost.