Class HTMLUtils

java.lang.Object
de.xima.fc.utils.HTMLUtils

public class HTMLUtils extends Object
Author:
XIMA MEDIA GmbH
  • Constructor Details

    • HTMLUtils

      public HTMLUtils()
  • Method Details

    • getTextContentFromHtmlFragment

      public static String getTextContentFromHtmlFragment(String htmlFragment)
      Given a string representing an HTML fragment, returns the text content of that HTML fragment. When the string could not be parsed as HTML, returns the best estimate.
      Parameters:
      htmlFragment - HTML fragment to parse
      Returns:
      The text content of the HTML fragment or the default value if unparseable HTML.
    • parseDocumentLenient

      public static org.jsoup.nodes.Document parseDocumentLenient(String htmlFragment, String baseUri)
      Parses an HTML string and returns the document. Any errors are ignored, a best effort is made to parse the document.
      Parameters:
      htmlFragment - The HTML fragment to parse
      baseUri - The base URI of the document, can be empty.
      Returns:
      The parsed document
    • parseDocumentStrict

      public static org.jsoup.nodes.Document parseDocumentStrict(String htmlFragment, String baseUri) throws HtmlParseException
      Parses an HTML string and returns the document. If the fragment cannot be parsed, throws an exception.
      Parameters:
      htmlFragment - The HTML fragment to parse
      baseUri - The base URI of the document, can be empty.
      Returns:
      The parsed document
      Throws:
      HtmlParseException - If the HTML fragment could not be parsed
    • writeXMLFileToTidyHTMLFile

      public static void writeXMLFileToTidyHTMLFile(File inputFile, File outputFile, String title)
      Reads an XML input file, tidies up, wraps the output inside an HTML body an writes this to the output file.
      Parameters:
      inputFile - the xml input file
      outputFile - the output file to write the HTML to. Will be created if missing
      title - the HTML title