In-Memory-Update library

xml
xquery

#1

Hi all,

I recently started to work with MarkLogic and one of my first use cases was

  • pick up a document from his url
  • check some element values
  • depending to previous step update or remove elements

But it’s not allowed to modify a document in memory :cry:

So I was looking for a solution and found 2 possibilities (link) : to write a modified clone of the document with a typeswitch or to use In-Memory-Update (more easier to implement for me)

I tried and both worked well but what is the best choice ?
What should you use it for ?

Thanks


#2

Excellent question, @johan. First of all, a couple of comments about Dave’s article:

  • I don’t think it makes it clear that the in-mem-update lib follows the same approach than the one from FXSL, in the sense that each call means a copy of the entire tree;

  • the limitation of typeswitch on attribute nodes is not true anymore.

Note also that copying the entire tree means exactly that. Even if you pass an element way down in the document (way down in the tree of elements), then the entire document is copied over. This is inevitable with XML in (standard) XQuery, and is then a drawback in all of the solutions discussed in the article.

So in case you need to make one single change in an XML document, the FXSL and in-mem-update are the easiest way to express it.

If you need to apply several changes to a tree, or change only part of it (e.g. get a modified copy of a specific sub-element), the recursive descent is probably the best option. Do not forget that the technique is not necessarily tight to typeswitch, you can use any logic you want to drive the recursion. But usually you want to start from the following template:

(:~
 : Template for the recursive descent copy.
 :
 : This function does not actually modify anything as such.  Modify it as needed,
 : this version does by default, for each node, a recursive exact same copy.
 :
 : Note that in addition to typeswitch cases, you can also have regular if/else
 : statements, for more complex conditions, but conditions that can be expressed
 : as typeswitch cases will be more efficient.
 :)
declare function local:modify($node as node())
{
   typeswitch($node)
      (: recursively copy documents and elements :)
      case document-node() return
         document {
            $node/node() => local:modify()
         }
      case element() return
         element { fn:node-name($node) } {
            $node/(@*|node()) => local:modify()
         }
      (: copy attributes, text nodes, comments and PIs :)
      case attribute() return
         $node
      case text() return
         $node
      case comment() return
         $node
      case processing-instruction() return
         $node
      (: in MarkLogic, you can have other node types, like binary() and object() :)
      default return
         fn:error((), 'Unexpected node type')
};

Another difference is that with the FXSL and in-mem-update approach, the returned value is the root of the tree (that is, the top-level element or the document node if any). That is what actually gets copied: the entire tree.

So if you want to retrieve the copy of your original node, the one you pass as a parameter, then you need to retrieve it some way or another from the top of the tree. With the recursive descent approach, you get a copy of the exact node you pass.

Those are the main differences I can think of (being able to apply all changes in a single pass, and returning the copy of the very node passed as parameter).

So my rule of thumb:

  • for one single and simple change, FXSL or in-mem-update are OK
  • for anything else, use the recursive descent copy

Or even better, use XSLT if you dare :wink: