Recently I ran into an architectural problem when parsing XML with Nokogiri. I used an xpath to find child elements in a document. Coming to the conclusion that replacing that xpath
#search with a self-baked
#find_all would lead to a better design I set up a quick benchmark.
The XML contains a root node with 1000 empty childs.
<root> <item /> <item /> ... 998 times </root>
This is the code I am using now.
Note the usage of
#search which evaluates the xpath expression and returns a list of matching nodes.
The replacement code comes here.
Nokogiri::XML(xml).root.children.find_all do |c| c.name == "item" end
Instead of invoking the internal search I do it myself by querying each child.
xpath: 0.003901151 find_all: 0.014400985
Going the “official” way by using an xpath is about 3.5 times faster! Wow.
It turns out that the manual comparison in
find_all is the bottleneck. I guess Nokogiri has some internal optimization which saves the creation of the child nodes.
Nokogiri::XML(xml).root.children children: 0.003361085
Takes about the same amount of time than the xpath search (without having filtered matching elements).
Here’s the benchmark code. I’ll keep going with the xpath search.