Saturday, November 22, 2008

2122 Eagle Verification

Saturday, November 22, 2008

"Nature" is on PBS right now, and the show is about bald eagles. I am reminded of the time I was standing at a vulture cage in a tiny community zoo somewhere in Wales. A woman and a little girl were standing next to me. They looked at the two vultures silently for a while, then the little girl said, "Mummy, is that a bald eagle?"

The mother answered, "I guess so. It's big and it's bald. Never understood what the Americans saw in such an ugly bird."

I cracked up.

---------------------

To the recycle center today. It was in the middle 20s, and there was a lot of wind. My hands haven't recovered yet. They burn.

But at least I got rid of the two huge garbage bags of shredded paper from the last "cleaning out the files" spasm. My shredder cuts small diamonds, and Jasper had found the bags and torn them open. Little diamonds of paper all over the house.

Jasper is going to kill me. He follows me everywhere (except to the den, which is the exclusive province of Miss Thunderfoot), but he doesn't follow - he leads. He runs in front of me, and then stops suddenly, flopping onto his side or back. I'm always tripping over him. I know he's asking for petting, but that's not the way to get it, and I don't encourage it by petting him when he does it. I've tried to discourage it by "walking through" him, pushing him out of the way with my feet, but it hasn't worked. I'm getting frustrated. It's especially dangerous at night.

---------------------

Chris, over at "Inane Thoughts...", wonders if others have noticed a change in word verification offerings. He says they seem to look more like words now, rather than a random collection of letters.

Yeah, I had noticed. Sometimes they even seem to have some application to the topic, or to carry a comment of their own. Downright eerie. They do seem to be less confusing than the random ones, although they are still not real words. They just look like they could be. Or are trying to be.

On that topic, I read something recently, and as usual I don't remember where, about using real words in the verification process.

Many organizations are putting books online, thousands of paper books are being digitalized by people who stand at scanners all day, feeding in pages. Software takes the scanner images, "reads" them, and translates them to text. Occasionally (or often, depending on the age, font, and condition of the book) a word can't be translated because letters are broken, or the ink is smudged, or the word is clear but not in any dictionary. The software can't figure out what the word should be.

Humans could easily figure out what the word should be, but there are simply too many to make it cost effective.

Somebody had the great idea of using us to figure out what the word is. Many sites are now using those scans for word verification, and as a kind of bonus, they rent the lists, helping to support the book project. We are shown the scan, and we figure out what the word is. I've seen many of them. They're apparent from the smudged ink, the obvious "old book" look, and from the fact that two words are offered, one word being quite clear (that's the REAL verification word), the other being messy.

The folks digitalizing the books collect our guesses on the messy words, which are tagged as to the book and location they came from, and after a certain number of people identify one as a certain word, that information goes into the book text.

Watch for them. At least now when there's frustration, perhaps a virtuous feeling can ameliorate it a tad.
.

1 comment:

Chris said...

So I'm glad I'm not the only one who noticed!

btw, you've been tagged.

Word verification right now: teatenic

I could have fun with that one!