Tuesday, March 24, 2009

Digitizing Books 1 Word At A Time

   You know, there are a lot of smart people out there who are using what is probably the most untapped computing resource out there, bored humans.  Things like Games With A Purpose (GWAP) take your bored on the net time and use it to enhance search reliability, metadata, and all sorts of other interesting things.  Games like FoldIt are taking incredibly complex problems like predicting protein folding structures to help find a cure for HIV/Aids, Cancer, and other diseases.
   However, you have to go and actively play those, and they take the form of a game, but there are others that are using your brain without you even knowing.  One I am sure you have run across is CAPTCHA, you know this:
   You see this on Facebook, EBay, and even here on Blogger.  The point is that a computer cannot understand that squiggly word and thus cannot do whatever it is the webmaster is trying to prevent computers from doing.  However, the fact the computer cannot read the text is the main problem in using Optical Character Recognition (OCR) to digitize old books, newspapers, and other text.  
   It used to be a few people would sit around and try to fix all the problems that came out of the OCR, but this takes tons of man hours and, really, who wants that job?  So, the CAPTCHA people thought it would be brilliant to team up with these archivists and use the 200,000,000 CAPTCHAs done each day for a worthy cause.  
   The result: reCAPTCHA.  Just another example of how smart people can harness unused brain power to a worthy cause (or their own).  Hopefully, this type of thing will become standard fare, using the millions of brains connected to the internet to basically cloud compute ourselves into the future.
-Idaho Bob-

No comments:

Post a Comment