Class HTMLUtils
java.lang.Object
de.xima.fc.utils.HTMLUtils
- Author:
- XIMA MEDIA GmbH
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic StringgetTextContentFromHtmlFragment(String htmlFragment) Given a string representing an HTML fragment, returns the text content of that HTML fragment.static org.jsoup.nodes.DocumentparseDocumentLenient(String htmlFragment, String baseUri) Parses an HTML string and returns the document.static org.jsoup.nodes.DocumentparseDocumentStrict(String htmlFragment, String baseUri) Parses an HTML string and returns the document.static voidwriteXMLFileToTidyHTMLFile(File inputFile, File outputFile, String title) Reads an XML input file, tidies up, wraps the output inside an HTML body an writes this to the output file.
-
Constructor Details
-
HTMLUtils
public HTMLUtils()
-
-
Method Details
-
getTextContentFromHtmlFragment
Given a string representing an HTML fragment, returns the text content of that HTML fragment. When the string could not be parsed as HTML, returns the best estimate.- Parameters:
htmlFragment- HTML fragment to parse- Returns:
- The text content of the HTML fragment or the default value if unparseable HTML.
-
parseDocumentLenient
Parses an HTML string and returns the document. Any errors are ignored, a best effort is made to parse the document.- Parameters:
htmlFragment- The HTML fragment to parsebaseUri- The base URI of the document, can be empty.- Returns:
- The parsed document
-
parseDocumentStrict
public static org.jsoup.nodes.Document parseDocumentStrict(String htmlFragment, String baseUri) throws HtmlParseException Parses an HTML string and returns the document. If the fragment cannot be parsed, throws an exception.- Parameters:
htmlFragment- The HTML fragment to parsebaseUri- The base URI of the document, can be empty.- Returns:
- The parsed document
- Throws:
HtmlParseException- If the HTML fragment could not be parsed
-
writeXMLFileToTidyHTMLFile
Reads an XML input file, tidies up, wraps the output inside an HTML body an writes this to the output file.- Parameters:
inputFile- the xml input fileoutputFile- the output file to write the HTML to. Will be created if missingtitle- the HTML title
-