The first task of the project was to implement a parser for SBML and MathML. There are some existing crates that do this so my initial attempt was to make them work together.
The SBML parser rust_sbml supports serializing and deserializing various components of the SBML specification through the serde library. However, it doesn't support MathML.
The MathML parser, mathml supports parsing several constructs of the MathML specification, but due to the recursive nature of MathML, the Deserialize trait from serde was not implemented for the MathNode enum.
When trying to deserialize into a recursive enum through serde, it results in a stack overflow. mathml originally depends on roxmltree so after several failed attempts to get it to work with serde, I tried changing the XML parser to xml-rs and then to quick-xml, but all to no avail.
I also posted the error on Stack Overflow here and the response was that it seems to be a bug in quick-xml and recursive enums are nowhere mentioned in their docs.
Later, I also tried the serde-xml-rs plugin to serde but that also led to the same issue.
After this, it was clear that I would have to implement the parser from scratch and the fantastic serde crate was not going to be an option. So I went back to my initial implementation, the one I had made as an evaluation task for the GSoC project.
This was based on the arena approach described here. So I imported SBML and MathML structs from rust_sbml and mathml and started building the parser.
Along the way, I realized that due to the way XML parsing works, there was going to be a lot of code duplication. For example, the match on "species" would have to be repeated for all tags, keeping in mind their parents, attributes (and their datatypes) every time.
That's when I found out about proc-macros from Jon! And once I had the macro set up, the entire implementation just came down to:
Notice the `attach!` and `push!` macros. They provide a syntax that makes their function really obvious. Separated by the `as` and `into` keywords I was able to allow specifying arguments and types as well. Also leads to a little bit of duplication because arguments have to be specified once here and one in the actual structs that the XML is going to be deserialized to, but it's much better than writing the whole thing every time as shown above.
I have been able to get this to work for MathML as well as SBML components and I have a basic skeleton ready for both the parsers. Most of what I have described above was built in my evaluation task repo, sbml_sim so now I have created separate repositories, sbml-rs and mathml-rs.
Over the next two weeks, I'm going to implement supports for various components to both the parsers and finally push them to the crates repository.
As an aside, last night, I found the yaserde library, which implements serde support specifically for XML, so I thought I'll give this a shot too, but the recursive structs and enums again lead to a stack overflow, as reported on their issues page.
That's all for this week, thanks for reading!
Comments
Post a Comment