Convert Word Document to Clean HTML

Often times, I receive either website copy or press release articles in a Word document with links and formatting within it. The manual way of converting it to HTML would be to copy and paste all of the verbiage from the Word doc into Notepad or TextEdit in plain text, and then start replacing the special characters such as double quotes, apostrophes, etc., and adding the paragraph tags and formatting such as <strong> and <em> or <b> and <i>. If the copy is going on a WordPress site or a platform that automatically converts new lines into paragraphs, then you can omit adding <p> or <br /> tags.

The good news is I have a tool that will help boost your productivity in converting Word documents to clean HTML:

Convert Word Doc to HTML

Bad news is that you would still need to do some clean up after it has been converted to HTML but I have yet good news, again. I have come up with a process for making this task efficient.

Instructions:

  1. Convert your Word doc into HTML.
  2. Copy and paste the HTML into Notepad and save it as a content.html
  3. Find all instances of curly double quotes, apostrophes, and em-dashes. Replace it with straight double quotes, straight apostrophes and &mdash;, respectively. Otherwise, you’ll end up with gibberish on your page.
  4. Open content.html in a web browser.
  5. Copy and paste from your web browser into the content box in WordPress or into the content box of your press release distribution website.