Thursday, October 16, 2008

Java class to test XPath queries in XML documents

Something we have to do more and more these days is run XPath queries on XML documents, either interactively or programatically. Sometimes we need to test the queries we develop. Eclipse has several plugins in various states of quality that can do this for you. If none of them work well enough for you, you'll need a different approach.
I'll show here a simple Java class that uses the javax.xml.xpath.XPath class and the Apache Commons CLI package in barely 100 lines of code that you can wrap with a shell script to easily run XPath queries against arbitrary documents, and specifying arbitrary namespaces.

Command-line usage

The "usage" string for the shell-script wrapper, "xpathfind", is the following:
xpathfind -p filepath {-x xpath}+ {-n pfx:ns}*

This means that you supply a single file path to a document, along with one or more XPath strings, and zero or more prefix and namespace pairs.

Example

Assuming we have the following "web.xml" file:
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" id="WebApp_ID" version="2.5">
<display-name>mdkcarousel</display-name>
<welcome-file-list>
<welcome-file>index.html</welcome-file>
<welcome-file>index.htm</welcome-file>
<welcome-file>index.jsp</welcome-file>
<welcome-file>default.html</welcome-file>
<welcome-file>default.htm</welcome-file>
<welcome-file>default.jsp</welcome-file>
</welcome-file-list>
<filter id="abc">
<filter-name>No Caching Filter</filter-name>
<filter-class>carousel.NoCachingFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>No Caching Filter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
</web-app>

The following command line and resulting output shows some examples of what it can do:
% xpathfind -p etc/web.xml -n ":http://java.sun.com/xml/ns/javaee" -x "//:welcome-file[following-sibling::*/text()='index.htm']" -x "//:filter-name/text()" -x "//:filter/@id"
result = "index.html".
result = "No Caching Filter".
result = "abc".
This example demonstrates a common roadblock people run into when writing XPath strings, how to deal with the default namespace? The idea is simply to register the namespace with a blank prefix value. In the actual XPath string, you have to specify the colon with no prefix, as opposed to leaving out the prefix.

Now let's see how we do this.

The Code

The XPathFind class I show here uses classes in package javax.xml.xpath for XPath processing, and package org.apache.commons.cli to process command-line arguments. The associated MapNamespaceContext class implements the javax.xml.namespace.NamespaceContext interface, which is used by the javax.xml.xpath.XPath class to specify prefix and namespace pairs to use during the processing of XPath queries. I'll list both of these classes without package or import statements for brevity. In practice, you should not use the default package.
Also, the "xpathfind" shell script is a simple wrapper around "java" to call the class and pass the command-line arguments. This version is customized for Cygwin on Windows. A version for "plain" Unix can easily be developed from this, and a Windows batch file is very straightforward to build (except for the annoyance of limited command-line arguments).

XPathFind.java

public class XPathFind
{
public static void main(String[] args)
{
Option pathOption = new Option("p", "path", true, "Path to XML file");
pathOption.setRequired(true);

Option xpathOption = new Option("x", "xpath", true, "XPath expression to search for");
xpathOption.setRequired(true);
xpathOption.setArgs(Option.UNLIMITED_VALUES);

Option namespaceOption = new Option("n", "namespace", true, "prefix:namespace to use in xpath");
namespaceOption.setArgs(Option.UNLIMITED_VALUES);

Options options = new Options();
options.addOption(pathOption);
options.addOption(xpathOption);
options.addOption(namespaceOption);

CommandLineParser parser = new PosixParser();

try
{
CommandLine line = parser.parse(options, args);

String filePath = line.getOptionValue("p");
String[] xpaths = line.getOptionValues("x");
String[] namespaces = line.getOptionValues("n");

File file = new File(filePath);
if (!file.exists() || !file.canRead() || !file.isFile())
{
System.out.println("File \"" + filePath + "\" is not a readable file.");
usage(options);
System.exit(1);
}

go(filePath, xpaths, namespaces);
}
catch (MissingOptionException ex)
{
usage(options);
System.exit(1);
}
catch (ParseException ex)
{
ex.printStackTrace();
System.exit(1);
}
catch (Exception ex)
{
ex.printStackTrace();
}
}

private static void go(String filePath, String[] xpaths, String[] namespaces)
throws FileNotFoundException, ParserConfigurationException, IOException, SAXException
{
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(filePath);

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
if ((namespaces != null) && (namespaces.length > 0))
xpath.setNamespaceContext(new MapNamespaceContext(namespaces));

for (String xpathStr: xpaths)
{
try
{
XPathExpression xpathExpr = xpath.compile(xpathStr);

String result = xpathExpr.evaluate(doc);
System.out.println("result = \"" + result + "\".");
}
catch (XPathExpressionException ex)
{
System.out.println("Xpath \"" + xpathStr + "\" was invalid: " +
ex.getCause().getMessage());
}
}
}

private static void usage(Options options)
{
HelpFormatter formatter = new HelpFormatter();
formatter.printHelp("xpathfind -p filepath {-x xpath}+ {-n pfx:ns}*", options);
System.out.println("Can specify one or more \"-x\" options, and zero or more \"-n\" options.");
}
}

MapNamespaceContext.java

public class MapNamespaceContext implements NamespaceContext
{
private Map uriMap = new HashMap();
private Map prefixMap = new HashMap();

public MapNamespaceContext(Map uriMap)
{
this.uriMap = uriMap;

for (String key: uriMap.keySet())
prefixMap.put(uriMap.get(key), key);
}

public MapNamespaceContext(String[] colonPairs)
{
uriMap = new HashMap();

for (String colonPair: colonPairs)
{
int colonIndex = colonPair.indexOf(':');
uriMap.put(colonPair.substring(0, colonIndex).trim(),
colonPair.substring(colonIndex + 1));
}

for (String key: uriMap.keySet())
prefixMap.put(uriMap.get(key), key);
}

public String getNamespaceURI(String prefix)
{ return (uriMap.get(prefix)); }

public String getPrefix(String namespaceURI)
{ return (prefixMap.get(namespaceURI)); }

public Iterator getPrefixes(String namespaceURI)
{ return (null); }
}

xpathfind

#! /bin/bash
export JAVA_HOME=$(cygpath -u "$XPATHFIND_JAVA_HOME")
export PATH="$JAVA_HOME/bin:$PATH"

java -classpath "$XPATHFIND_HOME/lib/commons-cli-1.1.jar;$XPATHFIND_HOME/build/classes" xpathfind.XPathFind "$@"

No comments: