Creating a domain specific language with XML-Schema and JAX-B
Here at scoyo we use the Magnolia CMS to manage our website content. It’s a quite nice tool based on the Content Repository API for Java (JCR) (JSR-170). In this blog post I want to tell you a little more about a small domain specific language, which I created to manage the migration of content and configuration of our Magnolia CMS installation, using XML-Schema, JAX-B and the facilities of the CMS.
To introduce you a little more into the domain. When using a CMS with some other software components you’ll have a situation very similar like having a database. You have content which other (non-tech type) people manage and if you change your application sometimes you’ll have to change the content – or worse the structure of the content – too. In the database world there is a very nice book regarding the refactoring of databases.
As a solution to this issue the magnolia CMS offer some API related to the info.magnolia.module.ModuleVersionHandler. This API’s offers the possibility to define info.magnolia.module.delta.Tasks which may be related to specific module version. So the CMS may discover on startup time which Tasks to execute when migrating e.g. from module version 1.4.2 to 1.4.6. This is very nice because you can load partial XML content dumps or move some content (or configuration) nodes around.
Unfortunately you’ll have to do this while program the update procedure in Java in your ModuleVersionHandler. To close this gap we introduced an own domain specific language using XML. I refer to this as DSL because it evolved to be much more like a simple XML config file, I’ll hope you’ll get the point later on. The inspiration gave a freelance contractor with very deep magnolia know-how. So here is how I did it:
First of all the ModuleVersionHandler has to deliver all the information defined in the XML file, so you’ll have to override it. So we subclassed the info.magnolia.module.DefaultModuleVersionHandler to override methods like getExtraInstallTasks(...). This new version handler is responsible for reading the XML file, and converting it into the objects that magnolia handles.
Next step was to define a XML format. It should be an expressive format, nice to read and easy to understand. So waived of standard formats or facilities like the Springframework which also offers facilities to assemble object graphs from XML files. The format we choose looks like:
<updates> <version number="1.5.0"> <description>Changed implementation of header selection</description> <updates> <load file="config.modules.scoyo.dialogs.pageProperties.xml"/> <if-exists repository="WEBSITE" node="/en"><then> <set-property node="/de" repository="WEBSITE" property="reportSuiteIdLive"><value>test_en</value></set-property></then> </if-exists> </updates> </version> </updates>
which is quite nice to read and you’ll have your documentation in the right place. In order to automatically process it using JAX-B you’ll have to define a XML-Schema which will be used to generate your Java bindings. Besides this it’s always good to have a XML-Schema, I’ll give you some further examples later
Our schema looks like this (at least in eclipe)

We defined some elements (as seen on the left hand side). Each of this elements contains some attributes (or inner elements) which are passed to the magnolia Tasks. The clou of this XML-Schema is the Task type and the <task /> element. The <task /> element is an element, which all elements defining a specific task (e.g. <load />) must have as substitution group. Doing this allows on every place where a <task /> may be placed in XML another element of this substitution group may be placed too. In fact you’ll never use <task /> but always use a specific element like <load />. The Task XML type covers the XML type part of the <task /> element, when generating the XML bindings the Task XML type will be translated into an abstract super class for all specific types like the Load XML type, which defines the structure of the <load /> element.
To generate the XML binding classes from the XML-Schema we use the Apache Maven 2 plugin from the Codehaus mojos. You may configure it like this:
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>jaxb2-maven-plugin</artifactId>
<version>1.2</version>
<executions>
<execution>
<id>bootstrap</id>
<phase>generate-sources</phase>
<configuration>
<schemaFiles>bootstrap_1_2.xsd</schemaFiles>
<packageName>...</packageName>
</configuration>
<goals>
<goal>xjc</goal>
</goals>
</execution>
</executions>
<configuration>
<verbose>true</verbose>
<outputDirectory>${project.build.directory}/generated-sources/java</outputDirectory>
<clearOutputDir>false</clearOutputDir>
</configuration>
</plugin>
The JAX-B bindings will produce nice annotated java beans which you can use in the further work. Another nice side effect of using JAX-B is, that it’s blazing fast compared to use DOM or XPath processing of your XML files. Another nice effect of having an XML-Schema.
So, after defining the XML-Schema and generating the JAX-B binding the next thing to do is to unmarshal the data from the XML file (or InputStream) which is not more than 20 lines of standard java code. Then you have an object graph representing the XML and the only thing left is to create the objects you (or the Magnolia CMS) need from this object graph. This should be an accomplishable task for a serious developer
I choose to create a generic mapper interface like
interface TaskHandler {
info.magnolia.module.delta.Task create(T xmlTask);
}
and use a java.util.Map to determine the right TaskHandler instance for a XML task bean instance and voila I had all the info.magnolia.module.delta.Task instances which where defined in XML. Some glue code which does some sorting and other stuff completed my ModuleVersionHandler. A TaskHandler implementation could look like this:
dictionary.put(Load.class, new TaskHandler(Load.class) {
@Override
public Task create(Load xmlTask) {
String name = name(xmlTask, String.format("Load file '%s'.", xmlTask.getFile()));
String description = description(xmlTask, String.format("Load file '%s' for XML import.", xmlTask.getFile()));
String resource = String.format("/mgnl-bootstrap/%1$s/%2$s", moduleName, xmlTask.getFile());
return new BootstrapSingleResource(name, description, resource,
javax.jcr.ImportUUIDBehavior.IMPORT_UUID_COLLISION_REPLACE_EXISTING);
}
});
I used an anonymous inner class an add the instance directly to the dictionary mapping all JAX-B types to the TaskHandler. The task handler itself uses the getter from the XML bean to create a new instance of the BootstrapSingleResource task. You will mention that the name of the element (<load />) is more readable. One of the benefits using a specialized domain specific XML file.
I promised to give some more advantages of having an XML-Schema. Another Advantage is that you may validate the XML (quite boring) and when you deploy the XML to a public location (e.g. a web server) you will get auto completion in Eclipse. Which is very nice!
So my resume: Sometimes it’s more than only nice to have a solution specific to the problem domain. With very little effort you are able to create this solution using XML-Schema and JAX-B technology stack. The result in this case was a well documented XML format which can handle various content migration scenarios. With some further levels of indirections (which were very simple) I was able to introduce programming language constructs like conditions (e.g. the <if-exists repository="WEBSITE" node="/en" /> element). I guess it’s okay to call it a language now, isn’t it?
There are many situation where standard soultions like the Springframework (or the SOAP message XML format) are the best choice to quickly assemble an object graph, but when the XML (or the content in general) is object to changes made by humans it’s always worth to consider a more human readable variant.
Temporal coincidence or news on maven builds Virtual Pages with Tapestry 5


