Best Java Libraries for HTML to Markdown and Vice Versa
In this tutorial we will look at some of the best open source libraries in Java and android that we can use to:
- Convert HTML to Markdown
- Convert Markdown to HTML
- Convert Markdown to PDF
(a) flexmark-java
flexmark-java is a Java implementation of [CommonMark (spec 0.28)] parser using the
blocks first, inlines after Markdown parsing architecture.
Its strengths are speed, flexibility, Markdown source element based AST with details of the
source position down to individual characters of lexemes that make up the element and
extensibility.
The API allows granular control of the parsing process and is optimized for parsing with a large
number of installed extensions. The parser and extensions come with plenty of options for parser
behavior and HTML rendering variations.
Step 1: Install
Install it via gradle:
implementation 'com.vladsch.flexmark:flexmark-all:0.64.0'
Additional settings due to duplicate files:
packagingOptions {
exclude 'META-INF/LICENSE-LGPL-2.1.txt'
exclude 'META-INF/LICENSE-LGPL-3.txt'
exclude 'META-INF/LICENSE-W3C-TEST'
exclude 'META-INF/DEPENDENCIES'
}
For Maven, add flexmark-all
as a dependency which includes core and all modules to the
following sample:
<dependency>
<groupId>com.vladsch.flexmark</groupId>
<artifactId>flexmark-all</artifactId>
<version>0.64.0</version>
</dependency>
Step 2: Write Code
package com.vladsch.flexmark.samples;
import com.vladsch.flexmark.util.ast.Node;
import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.util.data.MutableDataSet;
public class BasicSample {
public static void main(String[] args) {
MutableDataSet options = new MutableDataSet();
// uncomment to set optional extensions
//options.set(Parser.EXTENSIONS, Arrays.asList(TablesExtension.create(), StrikethroughExtension.create()));
// uncomment to convert soft-breaks to hard breaks
//options.set(HtmlRenderer.SOFT_BREAK, "<br />\n");
Parser parser = Parser.builder(options).build();
HtmlRenderer renderer = HtmlRenderer.builder(options).build();
// You can re-use parser and renderer instances
Node document = parser.parse("This is *Sparta*");
String html = renderer.render(document); // "<p>This is <em>Sparta</em></p>\n"
System.out.println(html);
}
}
Reference
Read more using the following links:
Number | Link |
---|---|
1. | Read More |
2. | Follow code author |
(b). Copy_Down
CopyDown allows you Convert HTML into Markdown with Java.
It is a port to Java of the wonderful library Turndown.js.
Step 1:
You can install it via gradle:
dependencies {
implementation 'io.github.furstenheim:copy_down:1.0'
}
Or via Maven:
<dependencies>
<dependency>
<groupId>io.github.furstenheim</groupId>
<artifactId>copy_down</artifactId>
<version>1.0</version>
</dependency>
</dependencies>
Step 2: Convert
Then use it to convert your html into markdown as shown below using the convert()
method:
import io.github.furstenheim.CopyDown;
public class Main {
public static void main (String[] args) {
CopyDown converter = new CopyDown();
String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
String markdown = converter.convert(myHtml);
System.out.println(markdown);
// Some title\n==========\n\nSome html\n\nAnother paragraph\n
}
}
Options
It is possible to use options for converting markdown:
import io.github.furstenheim.CopyDown;
import io.github.furstenheim.Options;
import io.github.furstenheim.OptionsBuilder;
public class Main {
public static void main (String[] args) {
OptionsBuilder optionsBuilder = OptionsBuilder.anOptions();
Options options = optionsBuilder
.withBr("-")
// more options
.build();
CopyDown converter = new CopyDown(options);
String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
String markdown = converter.convert(myHtml);
System.out.println(markdown);
}
}
Option | Valid values | Default |
---|---|---|
headingStyle |
SETEXT or ATX |
SETEXT |
hr |
Any Thematic break | * * * |
bulletListMarker |
- , + , or * |
* |
codeBlockStyle |
INDENTED or FENCED |
INDENTED |
fence |
|
|
emDelimiter |
_ or * |
_ |
strongDelimiter |
** or __ |
** |
linkStyle |
INLINED or REFERENCED |
INLINED |
linkReferenceStyle |
FULL , COLLAPSED , or SHORTCUT |
FULL |
Reference
Read more using the following links:
Number | Link |
---|---|
1. | Read More |
2. | Follow code author |
(c). MDTool
This is a tool which can convert markdown to HTML.
- Easy to use
- Support basic markdwon syntax
- Support table syntax
- Support todo list
- No other jar dependency
Step 1: Install it
Pull it from Maven Central Repository:
<dependency>
<groupId>com.youbenzi</groupId>
<artifactId>MDTool</artifactId>
<version>1.2.4</version>
</dependency>
Or download source code at the link we will show you.
Step 2: Convert
You can then convert your markdown to html using the following methods:
MDTool.markdown2Html(new File(markdown_file_path, charset));
or
MDTool.markdown2Html(markdown_content);
Reference
Read more using the following links:
Number | Link |
---|---|
1. | Read More |
2. | Follow code author |
(d). markdown2document
Turn markdown files to a PDF or HTML document with the help of this java library. You can also use CSS styles to format your documents.
Step 1: Install it
Simply copy the code into your project.
Step 2: Use
First, you have to assemble a Document object. Then you use a Generator to generate you document.
There are currently two generators in the library:
PdfGenerator
HTMLGenerator
Both have a generate(Document document)
method which returns an Output
object.
Here is an example how to use is as a library:
ContentFactory contentFactory = ContentFactory.getInstance();
Document document = Document
.builder()
.markdownContents(Arrays.asList(
contentFactory.create("# Sample header \n sample content "),
contentFactory.create(new URL("https://floppylab.com/resources/markdown2document/sample.md"))
))
.styles(Arrays.asList(
contentFactory.create("body { font-family: sans-serif; color: #555; /* some comment*/ }"),
new Link("https://floppylab.com/resources/markdown2document/sample.css")
)).build();
PdfGenerator pdfGenerator = new PdfGenerator();
Output output = pdfGenerator.generate(document);
output.toFile("sample.pdf");
HtmlGenerator htmlGenerator = new HtmlGenerator();
output = htmlGenerator.generate(document);
output.toFile("sample.html");
Document
A document can consist of:
- a list of markdown contents –
List<Content> markdownContents
- a list of style inputs –
List<Input> styleInputs
- base uri for relative links (images, links, etc) –
String baseUri
Input
Input is an abstract class and there are two classes that extend it:
Content
Link
Content
A Content
object can be constructed directly from:
String
– as a text
But also can be constructed by ContentFactory
from:
String
– as a textURL
– url where the content can be foundPath
– path to file with optionalCharset
InputStream
– stream with content
Link
A Link
object can be constructed from:
String
– link
Output
An Output object contains the contents of the generated document, and it has the following methods:
toString
– returns the content as aString
toOutputStream
– returns the content as anOutputStream
toFile(String name)
– writes the content into a file with the given name (FileOutputStream
is used)
Reference
Read more using the following links:
Number | Link |
---|---|
1. | Read More |
2. | Follow code author |
(e). jHTML2Md
A simple converter from HTML to Markdown in Java.
Currently it hasn’t any options.
Step 1: Installation
Simply copy the source code into your project.
Step 2: Use it
It’s pretty simple, first add jSoup to the classpath. Then:
String markdownText = HTML2Md.convert(html, baseURL);
Where html is a String containing the html code you want to convert, and baseURL is the url you will use as a reference for converting relative links.
You can use directly an URL too, like this:
URL url = new URL("http://www.example.com/");
HTML2Md.convert(url, 30000);
The 30000
is the timeout for requesting the page in milliseconds.
Reference
Read more using the following links:
Number | Link |
---|---|
1. | Read More |
2. | Follow code author |