Captcha decoding with Ruby




What we will need:

  • Ruby
  • tesseract-ocr gem
  • RMagick gem



The original CAPTCHA


First of all the image is too small we need to resize it! (save the image as "1.png")


require 'tesseract'
require 'RMagick'
include Magick
img_raw = ImageList.new("1.png")
img_res = img_raw.resize(600,250)
img_res.display

Here is the result!


Now we will we extract the letters
Let's try to increase the contrast!


require 'tesseract'
require 'RMagick'
include Magick
img_raw = ImageList.new("1.png")
img_res = img_raw.resize(600,250)
img_lev = img_res.level(0,0.04)
img_lev.display


Another Big Image

[Image: lNzrmCm.png]

Let's make it easier for "tesseract"!
Let's sharpen the image!


PHP Code:

require 'tesseract'
require 'RMagick'
include Magick
img_raw = ImageList.new("1.png")
img_res = img_raw.resize(600,250)
img_lev = img_res.level(0,0.04)
img_fin = img_lev.sharpen(0,5)
img_fin.display


Image 

[Image: Y772Rh5.png]

Everything seems more clear so let's start the decoding!


require 'tesseract'
require 'RMagick'
include Magick
img_raw = ImageList.new("1.png")
img_res = img_raw.resize(600,250)
img_lev = img_res.level(0,0.04)
img_fin = img_lev.sharpen(0,5)
eng = Tesseract::Engine.new
eng.language = :eng
eng.whitelist = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
text = eng.text_for(img_fin).strip.delete(' ')
puts "Found string '#{text}'"

So I am creating a new "tesseract" engine object.
Telling it that the language is English and look only for specific characters! (In this case only capital letters)

The result!



The success rate is not really high!


Related Posts
Previous
« Prev Post
First