The HTMLPaser Java API library helps you to parse HTML based webpages. The library can be used for scrabing informations from a website. One possible case: You want to transform the informations of a website to a webservice. HTMLParser provides command line tools, self defined rules in Java and a intuitive user interface for the definition of parsing rules.
The following example defines a filter which search for "h1" tags with the class attribute "title" and the value "entry" on developers-blog.org .
import org.htmlparser.*;
import org.htmlparser.filters.*;
import org.htmlparser.beans.*;
import org.htmlparser.util.*;
public class DevelopersFilter
{
public static void main (String args[])
{
TagNameFilter filter0 = new TagNameFilter ();
filter0.setName ("H1");
HasAttributeFilter filter1 = new HasAttributeFilter ();
filter1.setAttributeName ("class");
filter1.setAttributeValue ("title");
NodeFilter[] array0 = new NodeFilter[2];
array0[0] = filter0;
array0[1] = filter1;
AndFilter filter2 = new AndFilter ();
filter2.setPredicates (array0);
NodeFilter[] array1 = new NodeFilter[1];
array1[0] = filter2;
FilterBean bean = new FilterBean ();
bean.setFilters (array1);
if (0 != args.length)
{
bean.setURL (args[0]);
System.out.println (bean.getNodes ().toHtml ());
}
else
System.out.println ("Usage: java -classpath .:htmlparser.jar DevelopersFilter ");
}
}
RegardsRafael Sobek
Technorati Tags: HTMLParser Scraber Java Library
