Fri, 29 Sep 2006

Element identity and structured data in XML [10:18]

Like most people, my first experience with XML was with document formats: XHTML, SVG and so forth. So I've always had the idea that an XML element represents some sort of platonic ideal object: if you have an HTML anchor element or a SVG path element, it's well defined what it means, and what the content model is. But documents aren't the only use for XML. For mugshot, we pass lots of structured data between the server and client via XMPP, and between the server and AJAX web pages. And we often have things like:

<user userId="abcd12abcd" name="John Doe">
    <currentTrack>
        <artist>Johnny Cash</artist>
        <song>I Still Miss Someone</song>
        <playLink service="itunes" playlink="http://store.apple.com/..."/>
        <playLink service="yahoomusic" playlink="http://music.yahoo.com/..."/>
    </currentTrack>
    <favoriteTrack>
        <artist>The Beatles</artist>
        <song>Here Comes the Sun</song>
        <playLink service="itunes" playlink="http://store.apple.com/..."/>
    </favoriteTrack>
</user>

There are two weird things about the above from the “platonic ideal” perspective. First, the <currentTrack/> and <favoriteTrack/> elements don't really have any meaning other than the relationship of their content to the parent element. They are just “attribute” that happen to have structured XML data. The second thing is that <currentTrack/> and <favoriteTrack/> have the same structure, even though the element names are different. You could try to fix up the second problem and use a common element name:

<user userId="abcd12abcd" name="John Doe">
    <currentTrack>
	<track>
	     <artist>Johnny Cash</artist>
	     <song>I Still Miss Someone</song>
             <playLink service="itunes" playlink="http://store.apple.com/..."/>
        </track>
    </currentTrack>
    [...]
</user>

Or you could even try to fix both problems by using a generic element to represent a "XML-valued" attribute:

<user userId="abcd12abcd" name="John Doe">
    <attr name="currentTrack">
	<track>
	     <artist>Johnny Cash</artist>
	     <song>I Still Miss Someone</song>
             <playLink service="itunes" playlink="http://store.apple.com/..."/>
        </track>
    </attr>
    [...]
</user>

Or we could split things up and actually use an attribute:

<track id="track1">
     <artist>Johnny Cash</artist>
     <song>I Still Miss Someone</song>
     <playLink service="itunes" playlink="http://store.apple.com/..."/>
</track>
<user userId="abcd12abcd" name="John Doe" currentTrack="track1"/>

But I think all of those are essentially silly. The natural way to do structured data in XML is just different than a document use. When you have structured data: 1) some elements only have a local meaning in the context of their parent element. 2) there is a concept of “type” which is distinct from element identity.

Now, if there are any XML experts in the readership, they probably are now thinking that the above is some combination of blindingly obvious, woefully simplified, and hopelessly misguided. And they are doubtless right. But I found it a useful clarification of my thinking around XML design.