123ArticleOnline Logo
Welcome to 123ArticleOnline.com!
ALL >> Computers >> View Article

Using Visual Web Ripper To Release Data Behind Captcha

Profile Picture
By Author: Tracy Morgan
Total Articles: 20
Comment this article
Facebook ShareTwitter ShareGoogle+ ShareTwitter Share

For a web page harvester, CAPTCHA technology creates an obstacle that slows your progress. CAPTCHA is the human detection software that requires web site visitors to enter a correct string of words, letters, or numbers stretched, squished, and often obscured by dots. While this makes web page harvesting rather difficult or impossible, for Visual Web Ripper, it simply adds an extra step. There are two types of CAPTCHA protection: semi-automatic and full-automatic. The semi-automatic version requires the manual decoding of CAPTCHA images while running a project. Full-automatic data scraping is costly and requires an account with a third-party CAPTCHA recognition service, but there's no manual decoding.

CAPTCHA Data Extraction

The following is required for a successful CAPTCHA processing. The text in bold is required only for the Full-Automatic Data Extraction:
1.1. Add a content element that selects the CAPTCHA image. Then use the Misc options tab to uncheck the Save content option.
1.2. Add a FormField element that selects the CAPTCHA input field. Then use the Advanced Options tab to select the image ...
... element as a CAPTCHA element.
1.3. Use the Advanced Options tab to add a Decode CAPTCHA script to the FormField element that selects the CAPTCHA input field.
1.4. Add a FormSubmit template that submits the CAPTCHA form. You may need to set the Misc option Optional template if the CAPTCHA form is not always displayed.
Upon encountering a Semi-Automatic CAPTCHA element, Visual Web Ripper displays the image and requests the code.

Full-Automatic Data Extraction

Register for an account with a third party recognition service (http://www.decaptcher.com uses a compatible API and a default DLL library, so it's highly recommended). This particular service charges $2 per 1000 decodes. No fees will be assessed by Sequentum.
Add a decode CAPTCHA script to a FormField element by clicking the Decode CAPTCHA script option button in Advanced Options. This is used to call the recognition service. It will receive the image as an input parameter and return the decoded CAPTCHA as a string
The script editor should open with the default code.
If you used the http://www.decaptcher.com service, input your login name and password.
Note: The default decode CAPTCHA script uses a DLL file provided by http://www.decaptcher.com. Some virus programs will say this is a bad file and is therefore no longer natively included in the Visual Web Ripper installer package. Download the file separately and place it in the Visual Web Ripper installation folder. You can download the file from: www.visualwebripper.com/download/decaptcher.zip
If you are using a service other than Decaptcher, you may manually write your own script in the editor. A working knowledge of OOP, especially and including C# and VB.NET is EXTREMELY useful.
Note: When writing the code, it is important not to change the name of the method DecodeCaptcha.

Overcoming a CAPTCHA dialog box is tricky for any web page harvesting project. In many cases, these sites are an insurmountable or unprofitable challenge. Using Visual Web Ripper with the above instructions allows harvesting of the data you seek.


For more information about Extracting Software Please visit www.visualwebripper.com

Total Views: 125Word Count: 488See All articles From Author

Add Comment

Computers Articles

1. Access Review Fatigue Is Not A People Problem. It Is A Design Problem
Author: Tushar Pansare

2. Which Is The Best Data Recovery Service Provider In Delhi?
Author: Stellar Data Recovery India

3. Microsoft Access 2024 Vs. Microsoft Excel 2024: Which Tool Should You Use?
Author: davudobuya55

4. When Identity Governance Becomes An Audit Ritual
Author: Mansoor Alam

5. Power Virtual Agents To Microsoft Copilot studio
Author: brainbell10

6. Dream11 Clone App Guide For The Upcoming Ipl Season
Author: Jorden James

7. Computer Network Tutorial: Step-by-step Guide To Networking Fundamentals
Author: Tech Point

8. Ai Video Creation Services In Chandigarh | Growchip Ai Technologies
Author: AI Video Creation Services

9. Why Online Privacy Matters More Than Ever In 2026
Author: Faraz

10. Spark Matrix™: Data Integration Tools
Author: Umangp

11. D365 Consultant Business Success
Author: brainbell10

12. Medical High-end Touch Monitor From Fortec Integrated
Author: FORTEC Integrated GmbH

13. Why Rugtek Ls3002 Is Best For Retail Barcode Scanning
Author: prime pos

14. Spark Matrix™: Conversational Commerce
Author: Umangp

15. Multi-path Cellular Aggregation | Cellular Sd-wan India | Smoad
Author: SMOD

Login To Account
Login Email:
Password:
Forgot Password?
New User?
Sign Up Newsletter
Email Address: