123ArticleOnline Logo
Welcome to 123ArticleOnline.com!
ALL >> Hardware-Software >> View Article

Ten Things You Should Know About Document Indexing

Profile Picture
By Author: Manuel J. Montesino
Total Articles: 916
Comment this article
Facebook ShareTwitter ShareGoogle+ ShareTwitter Share

It's document indexing that makes the tremendous speeds of document retrievals possible. As you may have noticed, Internet search engines retrieve documents relevant to your specific query from among billions of documents on the Web in less than a second. This would have been simply impossible if they had to search through all the billions in response to each query.

1. Search engines use what is called an inverted list index that lists the documents against each word, instead of words in each document. In response to a query, the engines look up the query words in their index and then list the documents against those words.

2. Typically there will be hundreds of documents, if not thousands, against each word. It then becomes necessary to rank the documents in order of relevance to the query. Relevance is determined by using certain rules set by the engine, and typically involves more than the density of the particular query words in each document.

3. The major search engines do what is known as full-text indexing, i.e. they check all the words in the document's content, and list it against each of these ...
... words (except perhaps too common words like ‘the').

4. Not all indexing is full-text indexing. Full text indexes tend to be huge, requiring much storage space on their own. Indexing by document meta tags take up much less space. The meta tags provide information about the document that helps retrieve it. For example, a brief note about the content of the document, its date of creation/modification and the author name might be attached as meta tags with each document.

5. Meta tag indexing requires that the user has an idea of what the tags contain so that the person can query using these values. This is typically achieved by having standard practices for describing document contents and document naming. Often, drop-down selection boxes of such descriptions and names are used for manually tagging the document so that different users will use the same terms for similar documents.

6. Indexing is mainly used with unstructured documents, such as correspondence, reports, articles and so on. Structured documents such as transaction records are typically stored in databases, and have unique IDs for each document. Database queries can then bring up the right document in little time (instead of the many documents brought up by search queries).

7. Computer systems typically add certain meta information automatically to each document they create or modify. The date of creation and document author name are examples of such automatically added data. Other data such as document content description can be manually added by the user, or added using such devices as standard-description barcode cards.

8. Indexing can be specialized as when scientific documents are indexed using scientific notation rather than standard words. The key issue is ease of subsequent retrieval. Searchers for scientific documents, for example, will typically find it easier to retrieve documents using the specialized notations.

9. When paper documents are scanned into digital images, they cannot be indexed as such. Instead, the images need to be processed further using such tools as OCR (Optical Character Recognition) software to convert the images of text characters into standard, machine readable ASCII or Unicode characters.

10. Document indexing is not the only way to facilitate their subsequent retrieval. A hierarchical directory structure with meaningfully named folders and subfolders, and proper classification of documents and their storage in relevant subfolders, can enable quick browsing to the correct folder and retrieval. Where necessary, this can be combined with folder-level indexing and search.

Without the facility of indexing the thousands of documents using, say a desktop search facility, businesses might find that retrieving unstructured documents is a tough, and often simply impossible, task. Indexing, full text or meta tag based, changes the situation dramatically making it possible to retrieve even a particular e-mail comparatively quickly. Indexing is thus a powerful business tool.

About Author:

Ademero, Inc. develops document archiving software . Based largely on user experience, the company's flagship product, Content Centralâ„¢, is a browser-based document management software system created to provide businesses and other organizations with a convenient way to capture, retrieve, and manage information originating in hard copy or digital form. Access a live preview of this document management solution by visiting the Ademero web site.

Total Views: 294Word Count: 713See All articles From Author

Add Comment

Hardware/Software Articles

1. What’s New In Usb4? Features, Benefits, And Compatibility
Author: Jennifer Truong

2. Top Methods To Change Ost File To Pst In Outlook Without Data Loss
Author: Rohan Wiese

3. Still Managing Sales Leads On Paper Or In Personal Notebooks? Switch To Leadomatic - The Smarter Way!
Author: kenovate solutions

4. The Evolution Of Computer Cables: From Serial Ports To Usb-c
Author: Jennifer Truong

5. Why Usb Devices Keep Disconnecting: Causes & Fixes
Author: Jennifer Truong

6. Video Arraignment Hearings Help Cut Delays, Making The Court More Efficient
Author: Palatine Technology Group

7. Release Of Open Source Bi Helical Insight 5.2.3
Author: Vhelical

8. The Impact Of Poor Labor Planning On Productivity And Operational Efficiency
Author: 3PL Insights

9. Beyond Spreadsheets: Time To Upgrade To Logistics Management Software
Author: Softlink Global

10. Custom Software Development Services In India | Lunar Web Solution
Author: Aman Sharma

11. How Salesforce Is Transforming Customer Relationship Management
Author: crmjetty

12. Preimplantation Genetic Testing (pgt): Should You Consider It?
Author: SEO Pahlajani

13. Unlock Growth With B2sell Ecommerce For P21 Benefits
Author: Gayahri

14. Fantasy Sports App Development: Tech Stack Choices And Api Integration Strategies
Author: Franklinclas

15. Hrms In Singapore: A Comprehensive Guide For Small And Medium Enterprises
Author: Adaptive Pay

Login To Account
Login Email:
Password:
Forgot Password?
New User?
Sign Up Newsletter
Email Address: