DOM vs SAX
There are two main approaches to parsing XML - stream based (SAX) and document based (DOM).
When I was first learning about parsing XML in Java I wrote a few small test programs, here's an example using the DOM approach. It was obviously written in a hurry and contains some strange things that need tidying up. I'll address this in a later entry.
If you are simply going to construct a Document and then traverse it, as this example does, you should probably consider SAX instead.
XML
Here's an example of XML
<inventory>
<animal type="mammal">
<name>Fred</name>
<species>Hippo</species>
<weight units="Kg">1552</weight>
</animal>
<animal type="reptile">
<name>
Gert
AKA Gertrude
the galloping reptile
</name>
<species>Croc</species>
</animal>
</inventory>
Output
This is what the program displays. Note that whitespace used in the XML for indentation is ignored but whitespace within the element data is retained.
inventory.animal.name() = 'Fred'.
inventory.animal.species() = 'Hippo'.
inventory.animal.weight() = '1552'.
inventory.animal.name() = '
Gert
AKA Gertrude
the galloping reptile
'.
inventory.animal.species() = 'Croc'.
Java
This is the Java program that parses the XML.
public class ParseXMLbyDOM {
public static void main(String[] args) {
String filename = "XML/animals.xml";
String uri = "file:" + new File(filename).getAbsolutePath();
Document doc = null;
try {
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
doc = builder.parse(uri);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
doRecursive(doc, "");
}
private static void doRecursive(Node node, String name) {
if (node == null)
return;
NodeList nodes = node.getChildNodes();
for (int i = 0; i < nodes.getLength(); i++) {
Node n = nodes.item(i);
if (n == null)
continue;
doNode(n, name);
}
}
private static void doNode(Node node, String name) {
String nodeName = "unknown";
switch (node.getNodeType()) {
case Node.ELEMENT_NODE:
if (name.length() == 0) {
nodeName = node.getNodeName();
} else {
nodeName = name + "." + node.getNodeName();
}
doRecursive(node, nodeName);
break;
case Node.TEXT_NODE:
String text = node.getNodeValue();
if (text.length() == 0 || text.matches("\n *")
|| text.equals("\\r")) {
break;
}
String type = "";
NamedNodeMap attrs = node.getAttributes();
if (attrs != null) {
Node attr = attrs.getNamedItem("type");
if (attr != null) {
type = attr.getNodeValue();
}
}
System.out.println(name + "(" + type + ") = '"
+ text + "'.");
nodeName = "unknown";
break;
default:
System.out.println("Other node "
+ node.getNodeType() + " : "
+ node.getClass());
break;
}
}
}