<import>, <any>, and simple types

importing types

Finally implemented some rough cut of the <import> support. The process is looking more and more like a language compiler. When you are processing multiple XSD schema, the actual parsing step doesn't really care about other files, but the type checking/object binding phase needs to be aware of the other schemas.

In the schema, all you have to do is write

<import namespace="http://www.example.com/IPO"/>

and the types etc. from the given namespace are imported to the current schema. So, this works more like Java's import than C's #include, which is limited to inclusion of a physical file.

I've amended the parser to take context: Seq[SchemaDecl] = Nil as additional parameter, which contains the list of schema from previous files. This implies that the compiler actually know which file to parse first, because if type Foo is defined in file A, and type Bar in file B uses Foo, file A needs to be parsed before file B. The dependency graph is calculated using the targetNamespace attribute of the schema and <import> elements. Ideally I probably should separate the type checking from parsing.

Finally, type lookup was updated to go through all of the schemas:

def getType(namespace: String, typeName: String): TypeDecl =
  if (namespace == targetNamespace && types.contains(typeName))
    types(typeName)
  else
    (for (schema <- context;
        if schema.targetNamespace == namespace;
        if schema.types.contains(typeName))
      yield schema.types(typeName)) match {
      case x :: xs => x
      case Nil     => error("Type not found: {" + namespace + "}:" + typeName)
    }

There are some more things I haven't implemented, but it's a good start for import/multiple files support.

dealing with <any>

Upon compiling real world schema files, I am discovering more issues like the big jackass <any> element.

The XML representation for a wildcard schema component is an <any> or <anyAttribute> element information item.

This guy salts my game because I am trying to bind the schema to some class structure, and he's saying now any element can be a child. For now, I am just going to ignore anything that are not mentioned explicitly in the schema, and throw them out. This may change if I want to implement round-trip of XML, object, and back to XML. Anyway, partial function can be used here to generate both the conversion and the check.

Suppose we have

<element name="Element1">
  <complexType>
    <sequence>
      <choice maxOccurs="unbounded">
        <element ref="Choice1"/>
        <element ref="Choice2"/>
        <any namespace="##other" processContents="lax" />
      </choice>
    </sequence>
  </complexType>
</element>

This now generates

case class Element1(arg1: Element1Option*) extends DataModel {
}
 
object Element1 {
  def fromXML(node: scala.xml.Node): Element1 =
    Element1(node.child.filter(Element1Option.fromXML.isDefinedAt(_)).map(
        Element1Option.fromXML(_)).toList: _*) 
}
 
trait Element1Option
 
object Element1Option {
  def fromXML: PartialFunction[scala.xml.Node, Element1Option] = {
    case elem: scala.xml.Elem if elem.label == "Choice1" => Choice1.fromXML(elem)
    case elem: scala.xml.Elem if elem.label == "Choice2" => Choice2.fromXML(elem)
  }
}

In the above code, Element1Option.fromXML is a partial function because it covers only part of the input domain. This automatically generates Element1Option.fromXML.isDefinedAt, which returns true only when it can handle it.

dealing with simple types

Options can become even trickier when you throw in simple types, which I usually bind to native Scala types. Like Element1Option, I am expressing options using scala's trait, which means both Choice1 and Choice2 mixes in Element1Option. What if one of the option is an element with simple type like xs:positiveInteger? You can't mix things into Int. You could make some generic RichInt kind of thing, but then I need to keep track of all instances of choices whose underlying type is Int. Two problems with that approach. First, it's probably not possible if you could have multiple schema. Second, you could have two options both going to Int, and I won't be able to distinguish one from the other if it's in some maxOccurs > 1 situation. It could be between <workPhone>212-555-5555</workPhone> and <mobile>212-333-3333</mobile> or something.

So I decided to generate a case class for each simple type options, which looks like this:

case class Element1Choice2(value: Int) extends DataModel with Element1Option
 
object Element1Choice2 {
  def fromXML(node: scala.xml.Node) =
    Element1Choice2(node.text.toInt)
}

This is kind of ugly, but it gets the job done. (Update: Options are now wrapped in DataRecord[A], see round trip)