Best Java Libraries for HTML to Markdown and Vice Versa

In this tutorial we will look at some of the best open source libraries in Java and android that we can use to:

  1. Convert HTML to Markdown
  2. Convert Markdown to HTML
  3. Convert Markdown to PDF

(a) flexmark-java

flexmark-java is a Java implementation of [CommonMark (spec 0.28)] parser using the
blocks first, inlines after Markdown parsing architecture.

Its strengths are speed, flexibility, Markdown source element based AST with details of the
source position down to individual characters of lexemes that make up the element and
extensibility.

The API allows granular control of the parsing process and is optimized for parsing with a large
number of installed extensions. The parser and extensions come with plenty of options for parser
behavior and HTML rendering variations.

Step 1: Install

Install it via gradle:

implementation 'com.vladsch.flexmark:flexmark-all:0.64.0'

Additional settings due to duplicate files:

packagingOptions {
    exclude 'META-INF/LICENSE-LGPL-2.1.txt'
    exclude 'META-INF/LICENSE-LGPL-3.txt'
    exclude 'META-INF/LICENSE-W3C-TEST'
    exclude 'META-INF/DEPENDENCIES'
}

For Maven, add flexmark-all as a dependency which includes core and all modules to the
following sample:

<dependency>
    <groupId>com.vladsch.flexmark</groupId>
    <artifactId>flexmark-all</artifactId>
    <version>0.64.0</version>
</dependency>

Step 2: Write Code

package com.vladsch.flexmark.samples;

import com.vladsch.flexmark.util.ast.Node;
import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.util.data.MutableDataSet;

public class BasicSample {
    public static void main(String[] args) {
        MutableDataSet options = new MutableDataSet();

        // uncomment to set optional extensions
        //options.set(Parser.EXTENSIONS, Arrays.asList(TablesExtension.create(), StrikethroughExtension.create()));

        // uncomment to convert soft-breaks to hard breaks
        //options.set(HtmlRenderer.SOFT_BREAK, "<br />\n");

        Parser parser = Parser.builder(options).build();
        HtmlRenderer renderer = HtmlRenderer.builder(options).build();

        // You can re-use parser and renderer instances
        Node document = parser.parse("This is *Sparta*");
        String html = renderer.render(document);  // "<p>This is <em>Sparta</em></p>\n"
        System.out.println(html);
    }
}

Here is the example

Reference

Read more using the following links:

Number Link
1. Read More
2. Follow code author

(b). Copy_Down

CopyDown allows you Convert HTML into Markdown with Java.

It is a port to Java of the wonderful library Turndown.js.

Step 1:

You can install it via gradle:

dependencies {
    implementation 'io.github.furstenheim:copy_down:1.0'
}

Or via Maven:

<dependencies>
    <dependency>
        <groupId>io.github.furstenheim</groupId>
        <artifactId>copy_down</artifactId>
        <version>1.0</version>
    </dependency>
</dependencies>

Step 2: Convert

Then use it to convert your html into markdown as shown below using the convert() method:


import io.github.furstenheim.CopyDown;
public class Main {
    public static void main (String[] args) {
        CopyDown converter = new CopyDown();
        String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
        String markdown = converter.convert(myHtml);
        System.out.println(markdown);
        // Some title\n==========\n\nSome html\n\nAnother paragraph\n
    }
}

Options

It is possible to use options for converting markdown:

import io.github.furstenheim.CopyDown;
import io.github.furstenheim.Options;
import io.github.furstenheim.OptionsBuilder;

public class Main {
   public static void main (String[] args) {
       OptionsBuilder optionsBuilder = OptionsBuilder.anOptions();
       Options options = optionsBuilder
               .withBr("-")
               // more options
               .build();
       CopyDown converter = new CopyDown(options);
       String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
       String markdown = converter.convert(myHtml);
       System.out.println(markdown);
   }
}
Option Valid values Default
headingStyle SETEXT or ATX SETEXT
hr Any Thematic break * * *
bulletListMarker -, +, or * *
codeBlockStyle INDENTED or FENCED INDENTED
fence or ~~~ |
emDelimiter _ or * _
strongDelimiter ** or __ **
linkStyle INLINED or REFERENCED INLINED
linkReferenceStyle FULL, COLLAPSED, or SHORTCUT FULL

Reference

Read more using the following links:

Number Link
1. Read More
2. Follow code author

(c). MDTool

This is a tool which can convert markdown to HTML.

  1. Easy to use
  2. Support basic markdwon syntax
  3. Support table syntax
  4. Support todo list
  5. No other jar dependency

Step 1: Install it

Pull it from Maven Central Repository:

<dependency>
    <groupId>com.youbenzi</groupId>
    <artifactId>MDTool</artifactId>
    <version>1.2.4</version>
</dependency>

Or download source code at the link we will show you.

Step 2: Convert

You can then convert your markdown to html using the following methods:

MDTool.markdown2Html(new File(markdown_file_path, charset));

or

MDTool.markdown2Html(markdown_content);

Reference

Read more using the following links:

Number Link
1. Read More
2. Follow code author

(d). markdown2document

Turn markdown files to a PDF or HTML document with the help of this java library. You can also use CSS styles to format your documents.

Step 1: Install it

Simply copy the code into your project.

Step 2: Use

First, you have to assemble a Document object. Then you use a Generator to generate you document.

There are currently two generators in the library:

  • PdfGenerator
  • HTMLGenerator

Both have a generate(Document document) method which returns an Output object.

Here is an example how to use is as a library:

ContentFactory contentFactory = ContentFactory.getInstance();

Document document = Document
        .builder()
        .markdownContents(Arrays.asList(
                contentFactory.create("# Sample header \n sample content "),
                contentFactory.create(new URL("https://floppylab.com/resources/markdown2document/sample.md"))
        ))
        .styles(Arrays.asList(
                contentFactory.create("body { font-family: sans-serif; color: #555; /* some comment*/ }"),
                new Link("https://floppylab.com/resources/markdown2document/sample.css")
        )).build();

PdfGenerator pdfGenerator = new PdfGenerator();
Output output = pdfGenerator.generate(document);
output.toFile("sample.pdf");

HtmlGenerator htmlGenerator = new HtmlGenerator();
output = htmlGenerator.generate(document);
output.toFile("sample.html");

Document

A document can consist of:

  • a list of markdown contents – List<Content> markdownContents
  • a list of style inputs – List<Input> styleInputs
  • base uri for relative links (images, links, etc) – String baseUri

Input

Input is an abstract class and there are two classes that extend it:

  • Content
  • Link
Content

A Content object can be constructed directly from:

  • String – as a text

But also can be constructed by ContentFactory from:

  • String – as a text
  • URL – url where the content can be found
  • Path – path to file with optional Charset
  • InputStream – stream with content
Link

A Link object can be constructed from:

  • String – link

Output

An Output object contains the contents of the generated document, and it has the following methods:

  • toString – returns the content as a String
  • toOutputStream – returns the content as an OutputStream
  • toFile(String name) – writes the content into a file with the given name (FileOutputStream is used)

Reference

Read more using the following links:

Number Link
1. Read More
2. Follow code author

(e). jHTML2Md

A simple converter from HTML to Markdown in Java.

Currently it hasn’t any options.

Step 1: Installation

Simply copy the source code into your project.

Step 2: Use it

It’s pretty simple, first add jSoup to the classpath. Then:

String markdownText = HTML2Md.convert(html, baseURL);

Where html is a String containing the html code you want to convert, and baseURL is the url you will use as a reference for converting relative links.

You can use directly an URL too, like this:

    URL url = new URL("http://www.example.com/");
    HTML2Md.convert(url, 30000);

The 30000 is the timeout for requesting the page in milliseconds.

Reference

Read more using the following links:

Number Link
1. Read More
2. Follow code author

Categorized in: