Added: Shamekia Cilley - Date: 06.12.2021 07:29 - Views: 30388 - Clicks: 1885
The basic functions of a search engine can be described as crawlingdata miningindexing and query processing.
Crawling is the act of sending small programed bots out to collect information. Data mining is storing the information collected by the bots. Indexing is ordering the information systematically. And query processing is the mathematical process in which a person's query is compared to the index and the are presented to that person. Crawling the Web by bots, also called spiders, is the process in which small programmed entities go out from the central computer to collect data.
They are pre-programmed to start at one website and collect all its information and links. Those links are then recorded. That list of links then becomes the order in which the bot will continue its path of data collection. So, a spider might start at lifepacific. After the spider is full, or a set time, the bot returns and uplo the content of the webs and all the links back to the central computer. Data mining is the collection of all the data that the bot returned.
Entire webs, preserved in HTML, are stored on the servers of the search engine. The stored version is not the live version of the web, what you see when you enter the URL in your browser, but an historical version called the cached version. Bots can be told to return to webs often, if the content changes often. So, a website like BBC News would request that the bots return often because of the frequency that their content changes. Bots will not find everything on the web.
If there are no links to a then it is basically invisible to search engines. If it is a web that requires a password, or is generated as a to a query, it will never be stored in a search engine. Those webs that will never be searched are called deep web or the invisible web.
The same concept is found in the back of a book, where major words are listed and what s they occur on. Google's index, the largest known internet index, called the Big Table, is so large it has to have indices to the indices; there is huge amounts of data present.
The indexing process, not only cites locations, but converts everything in s. Computers function on 1's and 0's, not on the English alphabet, or any other for that matter. The process of converting the words to s is important, because the process of searching is not based on words and letters, but on math.
The query, what you enter in the search box, has to be converted to s, so that the engine can process your request. Before it converts to s though, the search engine will get rid of several terms. Most search engines have a list of stop words, words that will not be searched. Most search engines will not search for the, and, it, be, will, etc. Those short words are just filler to the computer. If you absolutely need those words in the search then you must include them in quotation marks, or in Google add the plus before the term. Once the terms are converted to s, the engine then calculates what indexed terms are closest mathematically with what you asked for.
The algorithim is complex, but it returns items based on how close it is mathematically to your query. Those closer are listed higher on the list. Some engines will even show a percent of relevance. Higher scores for relevance are shaped by: if the words are in the title as opposed to just being in the text, if the word occurs in bold or italics on thehow many times the word occurs on aand quality of links to thatand if the words occur in the header invisible cloud of tags created by the web programmer.
Something to keep in mind, you are not searching the entire internet when you search. You are only searching an index of the internet.
Google has the largest index and will return billions of hits, but Yahoo is smaller and will return fewer hits. The difference is not just how many hits, but also that they are different hits. Each search engine sent bots in different directions, so they have indexed different parts of the web.
Not only that, but the list will be different because they work of different algorithms many exist and some are guarded secrets. Now that you know that you are searching an index and that the index is not the words, but mathematical representations, then constructing a search query should make more sense.
Keyword searching then, is just a matter of matching s in the index. Not a problem. Phrase searching is looking for exact matches of strings. Not a problem for the search engine. Wildcards and truncation work because the token the representing a term can be searched and wildcards can be put into it. My example in the other was savior vs saviour. In the index they might be represented by something like this is entirely made up for the example vs The wildcard would then tell the search engine to look at the index and look for any in the extra space; and the wildcard would just cut both tokens down to a root of Boolean operators force the search engine to use multiple entrees in the index.
OR basically asks for 2 searches and combines the. AND searches for both terms, but only returns those that are in common; it has to compare .
NOTis just the removal of commonwith the common being left out of the list. Proximity searching gets tricky, because the engine must search first for s in common, like the AND function. But then the engine compares locations of the terms; they are on the same web, but are they close enough in the text.
And the amazing thing is that the search takes less than a second. Knowing how the search engine works may help you think more about how you formulate your query; it should. It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge.
If you continue with this browser, you may see unexpected. Basics The basic functions of a search engine can be described as crawlingdata miningindexing and query processing. Crawling and Data Mining Crawling the Web by bots, also called spiders, is the process in which small programmed entities go out from the central computer to collect data. Query Processing The query, what you enter in the search box, has to be converted to s, so that the engine can process your request. So what difference does this make?Looking 4 someone different then all of the ret out there
email: [email protected] - phone:(366) 969-5385 x 6188
Library Terminology: Glossary of Library Terms