Python: Convert HTML to Word

While HTML is designed for online viewing, Word documents are commonly used for printing and physical documentation. Converting HTML to Word ensures that the content is optimized for printing, allowing for accurate page breaks, headers, footers, and other necessary elements for professional documentation purposes. In this article, we will explain how to convert HTML to Word in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Convert an HTML File to Word with Python

You can easily convert an HTML file to Word format by using the Document.SaveToFile() method provided by Spire.Doc for Python. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load an HTML file using Document.LoadFromFile() method.
  • Save the HTML file to Word format using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Specify the input and output file paths
inputFile = "Input.html"
outputFile = "HtmlToWord.docx"

# Create an object of the Document class
document = Document()
# Load an HTML file 
document.LoadFromFile(inputFile, FileFormat.Html, XHTMLValidationType.none)

# Save the HTML file to a .docx file
document.SaveToFile(outputFile, FileFormat.Docx2016)
document.Close()

Python: Convert HTML to Word

Convert an HTML String to Word with Python

To convert an HTML string to Word, you can use the Paragraph.AppendHTML() method. The detailed steps are as follows.

  • Create an object of the Document class.
  • Add a section to the document using Document.AddSection() method.
  • Add a paragraph to the section using Section.AddParagraph() method.
  • Append an HTML string to the paragraph using Paragraph.AppendHTML() method.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Specify the output file path
outputFile = "HtmlStringToWord.docx"

# Create an object of the Document class
document = Document()
# Add a section to the document
sec = document.AddSection()

# Add a paragraph to the section
paragraph = sec.AddParagraph()

# Specify the HTML string
htmlString = """
<html>
<head>
    <title>HTML to Word Example</title>
    <style>
        body {
            font-family: Arial, sans-serif;
        }
        h1 {
            color: #FF5733;
            font-size: 24px;
            margin-bottom: 20px;
        }
        p {
            color: #333333;
            font-size: 16px;
            margin-bottom: 10px;
        }
        ul {
            list-style-type: disc;
            margin-left: 20px;
            margin-bottom: 15px;
        }
        li {
            font-size: 14px;
            margin-bottom: 5px;
        }
        table {
            border-collapse: collapse;
            width: 100%;
            margin-bottom: 20px;
        }
        th, td {
            border: 1px solid #CCCCCC;
            padding: 8px;
            text-align: left;
        }
        th {
            background-color: #F2F2F2;
            font-weight: bold;
        }
        td {
            color: #0000FF;
        }
    </style>
</head>
<body>
    <h1>This is a Heading</h1>
    <p>This is a paragraph demonstrating the conversion of HTML to Word document.</p>
    <p>Here's an example of an unordered list:</p>
    <ul>
        <li>Item 1</li>
        <li>Item 2</li>
        <li>Item 3</li>
    </ul>
    <p>And here's a table:</p>
    <table>
        <tr>
            <th>Product</th>
            <th>Quantity</th>
            <th>Price</th>
        </tr>
        <tr>
            <td>Jacket</td>
            <td>30</td>
            <td>$150</td>
        </tr>
        <tr>
            <td>Sweater</td>
            <td>25</td>
            <td>$99</td>
        </tr>
    </table>
</body>
</html>
"""

# Append the HTML string to the paragraph
paragraph.AppendHTML(htmlString)

# Save the result document
document.SaveToFile(outputFile, FileFormat.Docx2016)
document.Close()

Python: Convert HTML to Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.