Sunday, May 29, 2011

How to convert multi-page HTML e-book for Kindle

Recently I've got link to the interesting e-book Architecture of Open Source applications. Because I prefer to read books on my Kindle and there was no MOBI version I've decided to prepare it myself. Here is step-by-step guide on how to convert multi-page HTML to format suitable for Kindle on your Ubuntu/Debian/Other linux.

Getting data

First of all we need to get HTML/CSS and image files to local machine. On my machine it's as simple as:
$mkdir ~/aosabook; cd ~/aosabook
$wget -I en,images -nd -r -k -l 3 http://www.aosabook.org/en
We want to download all documents recursively but only from en and images directories, don't create directory structure to local copy and replace paths in html documents so they're locally referenced. For more details check man wget.

Convert to single HTML

Now we have all data downloaded in ~/aosabook and to check whether book is readable we just have to open file:///home/aosabook/ in browser.
Because the structure of the web is multi-page, we have to do additional step. Convert the multi page document into single page. I've used htmldoc utility for this. Run htmldoc and do following:
  • input tab
    • choose Document Type: Book
    • Use add files button to add all html files from ~/aosabook/
    • Select cover image
  • output tab
    • set output to file
    • set output path to ~/aosabook/aosabook.html
    • set output format to html
Play with some other options namely width (set it to 600px for kindle 3) and hit generate.

Convert to Mobi

In order to convert to MOBI format suitable for Kindle, I've just added generated aosabook.html as book to Calibre (you use calibre for your kindle management right?) and clicked on book to convert to Mobi and upload to device.