Quantcast
Channel: Silk Engineering Blog
Viewing all articles
Browse latest Browse all 32

Haskell data types and XML

$
0
0

Here at typLAB it wasn’t evident from the beginning what would be the best choice for a storage back-end. We knew that we were about to build a web based editor and would be dealing with a lot of HTML5 documents with lots of meta data. After some careful consideration we decided to go for an XML database. More specifically, the Berkeley XML Database, lovingly called DBXML by its authors. We figured that using DBXML would give us some important advantages:

  • Collections of HTML5 documents will form the basis of data model. Only one trivial conversion from HTML5 to syntactically valid XML is needed to get our documents into the XML database. Once stored we can perform some interesting queries over our data.
  • XML databases allow for the storage of complex data layouts without having a strict schema. Without a schema it will be easier to adjust our data model over time without instantly breaking our software.
  • XQuery is a very expressive (almost-purely functional) querying language which is at least as powerful as SQL and far more flexible in the structure of the data to target.
  • XML can be used to both encode strictly defined datatypes and store free-form documents in the same document collection. This will enable us to put both our meta data and our documents in the same database.
  • A quick look on Hackage revealed there is an out-of-the-box easy-to-use Haskell binding available for the Berkeley XML database. No need to create custom bindings ourselves.
  • We are in the advantage (or disadvantage) of having Haskell as our language of choice for our server software. Because of the hierarchical nature of both XML and Haskell algebraic datatypes, an XML database feels like a perfect fit.

Once we decided to go for an DBXML back-end we had to figure out how to easily get values form our Haskell program in and out of the database. The rest of this post will be dealing with the last point of our enumeration: how to get a nice mapping from Haskell’s algebraic datatypes to our DBXML back-end.

XML queries

The DBXML binding for Haskell is a shallow wrapper around the existing C++ API. This library allows us to perform the common create, read, update and delete queries for entire XML documents or parts of it. Communication with the XML database happens mainly via XQuery. Queries and query parameters are passed into the API (and results will come out) using Haskell ByteString s. It is up to the programmer to setup the queries with the right XML structure and encoding. Take for example the (somewhat simplified) type signature of the query function:

query :: Collection -> Query -> Parameters -> IO [ByteString]

This function takes an identification of the XML collection, which is somewhat like a database handle, an XML query, a set of query parameters and returns a possibly empty list of XML snippets as ByteStrings. Too bad that all our domain objects are well-typed Haskell algebraic datatypes and not raw sequences of bytes. We need a simple XML (de)serialization tool for this.

XML picklers

The Haskell XML Toolbox(HXT) is a library containing a (quite extended) collection of XML processing tools. The library has support for XML parsing, pretty printing, XPath queries, XSL stylesheets, DTD, XSD and RelaxNG schemas and a lot more. Interestingly, HXT exposes a type class and an accompanying set of combinators called XML picklers. XML picklers can be used to build conversion functions from Haskell datatypes to XML and vice versa. The type class looks like this:

 class XmlPickler a where xpickle :: PU a

So for every type in the XmlPickler type class there is some PUavailable. The PU datatype is composed of a pair of pickle (serialize) and unpickle (deserialize) functions together with a schema description. Because we won’t be using the schema definitions we will ignore them for now. There is probably no need to ever touch the functions inside the PU type, because the library supplies a vast amount of basic pickler combinators to be used instead. To illustrate the usage of HXT picklers take this simple Haskell datatype representing a single user in our system:

data User = User { name :: String , email :: String , password :: String , openID :: String }

Using some of the basic pickler combinators from the library it is very easy to come up with a suitable XmlPickler instance:

class XmlPickler User where userPickle = xpElem "user" $ xpWrap ( (\(a, b, c, d, e) -> User a b c d e) , (\(User a b c d e) -> (a, b, c, d, e)) ) $ xp5Tuple (xpElem "username" xpText) (xpElem "name" xpText) (xpElem "password" xpText) (xpElem "email" xpText) (xpElem "openid" xpText0)

This instance uses the xp5Tuple function to pickle five sub-picklers into a big tuple. The five fields will be appropriately named elements from which the text value will be used. The tuple will be converted into a value of the User datatype using the xpWrap function. This is all you to need to manually write XML serialization and deserialization code. A bit off topic but interesting to note is the fact that the xpWrap function can be seen as a pickler specific and bidirectional version of the well known fmap for Functors. The xpWrap is used to define true isomorphisms. When we generalize the type of xpWrap to work arbitrary containers, lets call this function bifmap, and compare the type signatures this similarity becomes obvious:

fmap :: (a -> b) -> f a -> f b bifmap :: (a -> b, b -> a) -> f a -> f b

So, taking the XmlPickler instance for our User datatype we can now easily convert users into XML and read them back in, like the following example:

User "jd" "John Doe" "secret" "john@doe" "none"
 jd John Doe secret john@doe none 

Using the xpickle function from the type class and the xunpickleValfrom the HXT library we can now write a more suitable query function on top of the raw version. This version does not return ByteStrings, but values of any type we can convert to. Off course, the information in the database should match your datatype, otherwise the unpickler function will just produce parse errors resulting in an empty list.

query :: XmlPickler a => Collection -> Query -> Parameters -> IO [a]

Although this pickler example for our user is a very simple one, even more complicated datatypes, including multi-constructor and possibly mutual recursive datatypes, can quite easily be made an instance of the XmlPickler class. Unfortunately we still have to write them all by hand.

Going generic

After a few of years at the University of Utrecht we learned at least one valuable lesson, never write functions by hand when they can be derived generically. We decided to write a generic XML pickler function using the generic programming library Regular, developed (not entirely coincidentally) at the University of Utrecht. Regular is a relatively simple but powerful tool for writing data type generic functions. The library has support for deriving embedding projection pairs (conversions from and to a generic representation) using Template Haskelland provides enough reflection to inspect constructor names and record labels. The generic representation is encoded as a type family (the pattern functor, or PF) over the original data type. The generic pickler function we developed has the following signature:

gxpickle :: (Regular a, GXmlPickler (PF a)) => PU a

This means that for every type that we can convert to a generic representation (indicated by the Regular type class) and for every type that has a GXmlPickler instance for its generic representation, we can deliver a PU. The regular-xmlpickler package implements the GXmlPickler type class and the instances for the types of which the representations are composed. So, all we need now for our User datatype is to derive a generic representation and use the generic implementation for the XmlPickler instance.

$(deriveAll ''User "PFUser") type instance PF User = PFUser instance XmlPickler User where xpickle = gxpickle

Using this automatically derived XML pickler and the query function described above we can now query the DBXML backend for all users that satisfy a certain property:

jd :: IO [User] jd = query myCollection "/user[username=$name]" [("name", "jd")]

Now we’re able to query a database for pieces of XML and reify these as true Haskell values with almost no boilerplate involved! Because of the bidirectional behavior of the XmlPickler type class it shouldn’t be difficult to imagine that the same trick is applicable to inserting and updating database entries. We will discuss writing generic functions using the Regular library in a later post.


Viewing all articles
Browse latest Browse all 32

Trending Articles