What we will need:
- Ruby
- tesseract-ocr gem
- RMagick gem
The original CAPTCHA
First of all the image is too small we need to resize it! (save the image as "1.png")
Here is the result!
First of all the image is too small we need to resize it! (save the image as "1.png")
require 'tesseract'
require 'RMagick'
include Magick
img_raw = ImageList.new("1.png")
img_res = img_raw.resize(600,250)
img_res.display
Here is the result!
Now we will we extract the letters
Let's try to increase the contrast!
require 'tesseract'
require 'RMagick'
include Magick
img_raw = ImageList.new("1.png")
img_res = img_raw.resize(600,250)
img_lev = img_res.level(0,0.04)
img_lev.display
Another Big Image
Let's make it easier for "tesseract"!
Let's sharpen the image!
PHP Code:
require 'tesseract'
require 'RMagick'
include Magick
img_raw = ImageList.new("1.png")
img_res = img_raw.resize(600,250)
img_lev = img_res.level(0,0.04)
img_fin = img_lev.sharpen(0,5)
img_fin.display
Image
Everything seems more clear so let's start the decoding!
require 'tesseract'
require 'RMagick'
include Magick
img_raw = ImageList.new("1.png")
img_res = img_raw.resize(600,250)
img_lev = img_res.level(0,0.04)
img_fin = img_lev.sharpen(0,5)
eng = Tesseract::Engine.new
eng.language = :eng
eng.whitelist = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
text = eng.text_for(img_fin).strip.delete(' ')
puts "Found string '#{text}'"
So I am creating a new "tesseract" engine object.
Telling it that the language is English and look only for specific characters! (In this case only capital letters)
The result!
The success rate is not really high!