mixed content revisited

scalaxb added support for mixed contents a while back. When <xs:complexType mixed="true">, text nodes are placed in conjunction with the subelements of the complex type, like XHTML. Since I implemented it, it's been bothering me that the generated case class is not DRY.

For example,

<xs:element name="mixedTest">
  <xs:complexType mixed="true">
    <xs:choice maxOccurs="unbounded">
      <xs:element name="billTo" type="Address"/>
      <xs:any namespace="##other" processContents="lax" />
    </xs:choice>
  </xs:complexType>
</xs:element>

would generate case class Element3(arg1: Seq[rt.DataRecord[Any]], mixed: Seq[rt.DataRecord[Any]]). The first parameter arg1 contains the subelements; and the second mixed contains both the subelements and the text node. In order for the case class to round trip back to XML, it needed to store both text node and subelements in order in mixed; however because parsing was performed only for subelements, only arg1 would contain Address case class and mixed would contain unparsed scala.xml.Elem instance. Not very nice.
All this was partially due to the fact the parsing logic was not very sophisticated.

Now that real parsers are used for parsing, it was time to revisit this issue. We still need to preserve the order so we need mixed. If it handled parsing properly, we could get rid of arg1. So I updated the parsing logic to treat text node as part of the grammer when the complex type is mixed. Here's what scalaxb generates from the above example:

case class MixedTest(mixed: Seq[rt.DataRecord[Any]])
 
object MixedTest extends rt.ElemNameParser[MixedTest] {
  val targetNamespace: Option[String] = Some("http://www.example.com/mixed")
  def isMixed: Boolean = true
 
  def parser(node: scala.xml.Node): Parser[MixedTest] =
    optTextRecord ~ rep((((((rt.ElemName(targetNamespace, "billTo")) ^^ 
      (x => rt.DataRecord(x.namespace, Some(x.name), Address.fromXML(x.node)))) ~ 
    optTextRecord) ^^ 
      { case p1 ~ p2 => Seq.concat(Seq(p1),
      p2.toList) })) | 
    (((any ^^ (x => rt.DataRecord(x.namespace, Some(x.name), x.node))) ~ 
      optTextRecord) ^^ 
      { case p1 ~ p2 => Seq.concat(Seq(p1), p2.toList) })) ~ optTextRecord ^^
        { case p1 ~ 
      p2 ~ 
      p3 => MixedTest(Seq.concat(p1.toList,
      p2.flatten,
      p3.toList)) }
 
  def toXML(__obj: MixedTest, __namespace: Option[String], __elementLabel: Option[String], __scope: scala.xml.NamespaceBinding): scala.xml.NodeSeq = {
    var attribute: scala.xml.MetaData  = scala.xml.Null
 
    scala.xml.Elem(rt.Helper.getPrefix(__namespace, __scope).orNull,
      __elementLabel getOrElse { error("missing element label.") },
      attribute, __scope,
      __obj.mixed.flatMap(x => rt.DataRecord.toXML(x, x.namespace, x.key, __scope).toSeq): _*)
  }
}

As you can see the address object is parsed properly, and it's stored only once. It seemed to have solved the problem, but it created a whole another issue for round trip. DataRecord.toXML no longer knew how to output XML since it does not store scala.xml.Elem anymore. mixed is declared as rt.DataRecord[Any] so it can store built-in types like Int and String, XML nodes like scala.xml.Elem, and finally user-defined case classes like Address. XML output logic for built-in types and XML nodes can be shipped, but the user-defined types needs to be supported too. This looked like a good opportunity for me to try implementing type class:

trait XMLWriter[A] {
  implicit val ev = this
  def toXML(__obj: A, __namespace: Option[String], __elementLabel: Option[String],
      __scope: NamespaceBinding): NodeSeq
}

All of the companion objects already implement toXML, so they just have to extend XMLWriter[A]. However, I did not find a way to grab XMLWriter[A] out of Any once the object is stored in DataRecord[Any], which means XMLWriter[A] needs to be stored in DataRecord[A]. The problem with that approach is that it introduces extra parameter that's always set to specific value depending on value type A or DataRecord[A]. For String it will always be some __StringXMLWriter and for Address it will always be Address. On top of that, it adds an extra parameter that's not useful during pattern matching. Here's how I worked around it.

First, add a constructor helper method under object DataRecord called def dataRecord, which take the first three parameters explicitly and take the XMLWriter[A] implicitly using the context-bound grammer:

def dataRecord[A:XMLWriter](namespace: Option[String], key: Option[String], value: A): DataRecord[A] =
  DataRecord(namespace, key, value, implicitly[XMLWriter[A]])

At this point we need to supply implicit values for built-in types and XML nodes that are used in scalaxb:

object XMLWriter {
  implicit object __NodeXMLWriter extends XMLWriter[Node] {
    def toXML(__obj: Node, __namespace: Option[String], __elementLabel: Option[String],
      __scope: NamespaceBinding): NodeSeq = __obj
  }
 
  implicit object __StringXMLWriter extends XMLWriter[String] {
    def toXML(__obj: String, __namespace: Option[String], __elementLabel: Option[String],
        __scope: scala.xml.NamespaceBinding): scala.xml.NodeSeq =
      Helper.stringToXML(__obj, __namespace, __elementLabel, __scope) 
  }
...
}

Interesting thing about Scala spec is where it looks for the implicit parameters. Programming in Scala p.440-441:

Moreover, with one exception, the implicit conversion must be in scope as a single identifier.

There's one exception to the "single identifier" rule. The compiler will also look for implicit definitions in the companion object of the source or expected target types of the conversion.

Note implicit val ev = this in the definition of XMLWriter[A]:

trait XMLWriter[A] {
  implicit val ev = this
  def toXML(__obj: A, __namespace: Option[String], __elementLabel: Option[String],
      __scope: NamespaceBinding): NodeSeq
}

Since Address extends XMLWriter[Address], this make Address object available as an implicit value.

The new def dataRecord will at least solve the construction of DataRecord but we are still stuck with four parameters for pattern matching.

Pattern matching is nothing but an application of def unapply. In order to keep compatibility with the older DataRecord, we can define DataRecord as a trait with three original values. In the object DataRecord, we can define unapply as follows:

def unapply[A](record: DataRecord[A]): Option[(Option[String], Option[String], A)] =
  Some(record.namespace, record.key, record.value)

Now that pattern matching is faked, we might as well fake the object construction. Instead of def dataRecord, we can say def apply to mimic the constructor of DataRecord. To actually hold the values including XMLWriter[A], we define a private case class within object DataRecord:

object DataRecord {
  private case class DataWriter[+A](
    namespace: Option[String],
    key: Option[String],
    value: A,
    writer: XMLWriter[_]) extends DataRecord[A]
 
  def apply[A:XMLWriter](namespace: Option[String], key: Option[String], value: A): DataRecord[A] =
    DataWriter(namespace, key, value, implicitly[XMLWriter[A]])
 
  def apply[A:XMLWriter](value: A): DataRecord[A] =
    apply(None, None, value)
 
  def unapply[A](record: DataRecord[A]): Option[(Option[String], Option[String], A)] =
    Some(record.namespace, record.key, record.value)
 
  def toXML[A](__obj: DataRecord[A], __namespace: Option[String], __elementLabel: Option[String],
      __scope: scala.xml.NamespaceBinding): scala.xml.NodeSeq = __obj match {
    case w: DataWriter[_] => w.writer.asInstanceOf[XMLWriter[A]].toXML(__obj.value, __namespace, __elementLabel, __scope)
    case _ => error("unknown DataRecord.")
  }
}

Now we have backward-compatible DataRecord, which also does type-specific XML output.