Package de.xima.fc.utils
Class HTMLUtils
- java.lang.Object
-
- de.xima.fc.utils.HTMLUtils
-
public class HTMLUtils extends Object
- Author:
- XIMA MEDIA GmbH
-
-
Constructor Summary
Constructors Constructor Description HTMLUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String
getTextContentFromHtmlFragment(String htmlFragment)
Given a string representing an HTML fragment, returns the text content of that HTML fragment.static org.jsoup.nodes.Document
parseDocumentLenient(String htmlFragment, String baseUri)
Parses an HTML string and returns the document.static org.jsoup.nodes.Document
parseDocumentStrict(String htmlFragment, String baseUri)
Parses an HTML string and returns the document.static void
writeXMLFileToTidyHTMLFile(File inputFile, File outputFile, String title)
Reads an XML input file, tidies up, wraps the output inside an HTML body an writes this to the output file.
-
-
-
Method Detail
-
getTextContentFromHtmlFragment
public static String getTextContentFromHtmlFragment(String htmlFragment)
Given a string representing an HTML fragment, returns the text content of that HTML fragment. When the string could not be parsed as HTML, returns the best estimate.- Parameters:
htmlFragment
- HTML fragment to parse- Returns:
- The text content of the HTML fragment or the default value if unparseable HTML.
-
parseDocumentLenient
public static org.jsoup.nodes.Document parseDocumentLenient(String htmlFragment, String baseUri)
Parses an HTML string and returns the document. Any errors are ignored, a best effort is made to parse the document.- Parameters:
htmlFragment
- The HTML fragment to parsebaseUri
- The base URI of the document, can be empty.- Returns:
- The parsed document
-
parseDocumentStrict
public static org.jsoup.nodes.Document parseDocumentStrict(String htmlFragment, String baseUri) throws HtmlParseException
Parses an HTML string and returns the document. If the fragment cannot be parsed, throws an exception.- Parameters:
htmlFragment
- The HTML fragment to parsebaseUri
- The base URI of the document, can be empty.- Returns:
- The parsed document
- Throws:
HtmlParseException
- If the HTML fragment could not be parsed
-
writeXMLFileToTidyHTMLFile
public static void writeXMLFileToTidyHTMLFile(File inputFile, File outputFile, String title)
Reads an XML input file, tidies up, wraps the output inside an HTML body an writes this to the output file.- Parameters:
inputFile
- the xml input fileoutputFile
- the output file to write the HTML to. Will be created if missingtitle
- the HTML title
-
-