My new pursuit of frustration

Having finished my last project, I needed some new task to keep myself out of trouble. So I decided to pick back up something I had started work on before we moved.

See, tucked away in storage, we have somewhere north of two thousand books or so. Mostly because I am physically unable of getting rid of books. Even books I will never read again. (There was one notable exception to this rule, which I read and threw away, as an insult to the written word). With so many books, it was impossible for my bride and I to ever buy each other books as gifts, as there was a good chance we already had it. Even worse, I found I was buying myself books I already had.

The library database was supposed to fix all of that. the only problem was, getting the data in the database was a time consuming task that we quickly wearied of. So in the spirit of supporting child-labor, I put it off until the Critter was old enough to do the drudge work.

Fast forward a couple of years. We left nearly all our books behind us, but have managed to fill 3 new bookshelves overflowing in the past 12 months. And meanwhile, there are some unused barcode readers lying outside my office at work for a project that's on hold for a little bit. 'Hey,' my brain says to me one day, as I trip over the barcode reader box for the hundredth time, 'aren't there barcodes on the back of books?'

Once upon a time in the distant past, I worked for a library systems company. In that time, I managed to pick up a thing or two about book catalogs. Mostly I know that no one older than 12 actually uses the Dewey Decimal system I was forced to learn as a kid, and that librarians can, on average, hold their liquor better than you'd think. But I figured this should be a reasonable head-start on building a personal library catalog.

I started to do some research on that barcode on the back of your books (called an 'ISBN' code, if you weren't aware), and how it translates into usable text. I'm not the first guy out there with the idea of cataloging his home library this way. This guy is. (at least, he seems to be the standard link from a Google search.)

Here's what I know so far:

  • The standard barcode, isn't.
  • Some publishers use UPC codes instead of the 'International Standard' ISBN codes. Apparently the justification for this difference is 'we wanted to screw with you'
  • Even when they put the barcode on, it doesn't necessarily match the ISBN, because of a decently complicated checksum game which is supposed to 'help' you know you've scanned the right thing
  • Almost every book reseller in the world puts their own price stickers (with yet another, propietary barcode on it) right on top of the ISBN code. Meaning you have to peel the damned things off of each and every book before you can scan anything.
  • Even once you have the ISBN, there is no single online source of all the book data I'd like to reference that you don't have to pay for (Library of Congress? Surprisingly sparse information). The best source seems to be Amazon, which means 'scraping' the information from their standard search function, book by book).
  • The standard is changing Any Minute Now, with some publishers 'helpfully' getting an early start.

    I began to sense that all of this meant that I was doomed to start writing & troubleshooting some overly-complicated scripts to translate the barcodes and get the information (or at least, I'd have to port what that guy wrote onto my server), which I looked forward to with the same enthusiasm I muster for having my prostate checked.

    Finally, I ran across some shareware that eliminates some of the above problems. (at least, it deals with the barcode and pings Amazon, Barnes & Nobles and a number of other sources for you to get the information in usable form. This is then put into an access database, which I can import into my online version.

    Figuring all of this out took me most of one evening. The next day, I tried scanning in a list of books to test it out. With the afore-mentioned hassle of having to peel stickers off of about half the books (fortunately, books bought online don't have them, and I also tend to peel them off as I'm reading the book sometimes). It takes me about 20-25 minutes per shelf to scan them all in and put them back at the moment. I haven't run the 'go out and fetch the info' part of the program for more than a test sample yet, but that seems to take about 1-3 minutes per book. At this rate, I'll be done just in time to move back to California and fetch the others out of storage.

    There has just got to be an easier way.