narrower <choice>

Submitted by eed3si9n on Thu, 10/14/2010 - 23:30

rt.DataRecord came out of the necessity to fulfill two requirements. The first is to convert XML document into a native object so it could be consumed easier (also known as data binding); the second is to retain the ability to convert the native object back into the XML document (also known as round trip). The two requirements are sometimes not aligned with each other. For example, if I am just worried about getting the round trip done, I can just hold on to the scala.xml.Node and use that to write XML out, but it's not very useful from the data binding perspective.

The first goal of the scalaxb was to cover the full range of XML Schema and implementing the round trip. Now that the goal is mostly fulfilled, my recent updates were aimed to improve the usability of generated code while still maintaining the roundtripability. One area where the pendulum has shifted too much towards round trip is the code generated for <choice>. I'd like to review the history of code generation for <choice> and propose a new solution at the end.

first attempt

When I first implemented <choice>, I tried to generate a trait that represents all options under the choice, so the complex type including the choice can refer to the choice using it. I imagined it would look as follows:

case class Order(arg1: OrderOption, items: Seq[Item])
trait OrderOption
case class Address(name: String, street: String, city: String) extends OrderOption
case class InternalAddress(building: String, room: Int) extends OrderOption

There are several problems with this approach. Suppose the schema for the choice looked something like this:

<xs:choice>
  <xs:element name="groundShipping" type="Address" />
  <xs:element name="twoDayShipping" type="Address" />
  <xs:element name="oneDayShipping" type="Address" />
  <xs:element name="internalShipping" type="InternalAddress" />
</xs:choice>

The problem is that when we look at arg1, we wouldn't know which element name the Address was enclosed in. This demonstrates that an element in an XML document holds one more information than the data structure that presents the corresponding complex type: the name of the element. There are at least two ways to solve this problem.

second attempt

The first is to generate subclass for each of the elements as:

case class GroundShipping(name: String, street: String, city: String) extends OrderOption
case class TwoDayShipping(name: String, street: String, city: String) extends OrderOption
case class OneDayShipping(name: String, street: String, city: String) extends OrderOption
case class InternalShipping(building: String, room: Int) extends OrderOption

This is not very nice because Scala does not allow inheritance of case classes from another, so the structure needs to be repeated; and also because of the way namespace work in XML. In XML, the name of the child elements can be the same as a child element of another top-level elements, which may have different data structure. To prevent the naming crash, each class would need to be prefixed with the parent class name.

second approach to the second attempt

The second approach is to store the name of the element separately. This is the idea of rt.DataRecord. So the code would look like this:

case class Order(arg1: rt.DataRecord[OrderOption], items: Seq[Item])
trait OrderOption
case class Address(name: String, street: String, city: String) extends OrderOption
case class InternalAddress(building: String, room: Int) extends OrderOption

arg1 then would store something like

rt.DataRecord(None, Some("twoDayShipping"), Address("foo", "1537 Paper Street", "Wilmington"))

Well, none of the above ever happened because I hit another problem first. See the following schema for choice:

<xs:choice>
  <xs:element name="groundShipping" type="Address" />
  <xs:element name="twoDayShipping" type="Address" />
  <xs:element name="oneDayShipping" type="Address" />
  <xs:element name="internalShipping" type="xs:string" />
</xs:choice>

Right. What if one of the options were a simple type like xs:string?

third attempt

Faced with simple types and <any>, I gave up on being type safe for <choice>. All choices could be represented using rt.DataRecord[Any], which can store the element names and any values.

case class Order(arg1: rt.DataRecord[Any], items: Seq[Item])
trait OrderOption
case class Address(name: String, street: String, city: String) extends OrderOption

At least this solves the round trip problem, but I never felt good about the lack of type safety. Here arises the notion of narrower <choice>. The best way would be to figure out the least common supertype for all types in a given <choice>. Of course, this is easier said than done. Within the context of scalaxb, we are dealing with Scala code in text form, not real types, and to top it off many of the types that needs to be evaluated would not even exist yet. Because XML Schema allows complex types to be derived off of another, even among the generated types we have to worry about their lineage. For example, if there's a choice between USAddress and UKAddress, the least common supertype would be trait Addressable.

fourth attempt

I did, however, realize that there are some special cases that I could narrow down the type of rt.DataRecord. The first is when all options have the same type.

<xs:choice>
  <xs:element name="groundShipping" type="xs:string" />
  <xs:element name="internalShipping" type="xs:string" />
</xs:choice>

If both options were String, we can safely way that whichever the option would be a String.

case class Order(arg1: rt.DataRecord[String], items: Seq[Item])

Another case I can deduce a super type is when the options consists only of complex type within the same namespace, sequence, or choice, all of which extends the option trait.

<xs:choice>
  <xs:element name="groundShipping" type="Address" />
  <xs:element name="twoDayShipping" type="Address" />
  <xs:element name="oneDayShipping" type="Address" />
  <xs:element name="internalShipping" type="InternalAddress" />
</xs:choice>

So, for the above <choice>, it would generate:

case class Order(arg1: rt.DataRecord[OrderOption], items: Seq[Item])
trait OrderOption
case class Address(name: String, street: String, city: String) extends OrderOption
case class InternalAddress(building: String, room: Int) extends OrderOption

This is the same as I had before I expanded everything to rt.DataRecord[Any]. In this case, all of the options extends OrderOption because they are local complex types, so I can narrow it to OrderOption without checking the lineage of the types. For example, if both Address and InternalAddress extended something like trait Addressable, it might have been the narrowest type, but this is better than Any.