|
Java Open Source Projects
|
HTML Parser
HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags a... CyberNeko HTML Parser NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser... HtmlCleaner HtmlCleaner is open-source HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. For any serious consumption of such do... Cobra: Java HTML Renderer & Parser Cobra is a pure Java HTML renderer and DOM parser that is being developed to support HTML 4, Javascript and CSS 2. Cobra can be used as a Javascript-aware and CSS-aware HTML DOM... TagSoup TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish , though quite of... Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing... HotSAX HotSAX is a small fast SAX2 parser for HTML, XHTML and XML. JTidy JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In add... HtmlRipper HtmlRipper is a Java package that contains routines that enable dynamic data to be extracted from Web pages, HTML documents, using pre-defined rule sets. These routines allow you t... |