Recently I ran into an architectural problem when parsing XML with Nokogiri. I used an xpath to find child elements in a document. Coming to the conclusion that replacing that xpath #search with a self-baked #find_all would lead to a better design I set up a quick benchmark.
The XML contains a root node with 1000 empty childs.
<root> <item /> <item /> ... 998 times </root>
This is the code I am using now.
Nokogiri::XML(xml).root.search("./item")
Note the usage of #search which evaluates the xpath expression and returns a list of matching nodes.
The replacement code comes here.
Nokogiri::XML(xml).root.children.find_all do |c| c.name == "item" end
Instead of invoking the internal search I do it myself by querying each child.
Benchmarking time.
xpath: 0.003901151 find_all: 0.014400985
Going the “official” way by using an xpath is about 3.5 times faster! Wow.
It turns out that the manual comparison in find_all is the bottleneck. I guess Nokogiri has some internal optimization which saves the creation of the child nodes.
Nokogiri::XML(xml).root.children children: 0.003361085
Takes about the same amount of time than the xpath search (without having filtered matching elements).
Here’s the benchmark code. I’ll keep going with the xpath search.

Mark Thomas
Yes, the underlying libxml2 library is implemented in C and is very fast, and in Nokogiri you’ll always be better off using built-ins than Ruby code. Note that you could have used .xpath(“item”) instead of search(), because search() first makes a determination whether you are using XPath or CSS, and without the leading dot-slash it would have assumed CSS, which has a different behavior.
nick
Mark: Thanks! I thought
xpath("item")would return any item node in the tree, not just the children of the context node – RTFM would have helpedAlso, I thought Nokogiri uses libxml2 just for parsing, not searching etc. Makes sense now.