This is an article about optimizing speed in Node.JS, since every article I found was about optimizing memory usage, finding out memory leaks, but none of the methods I found helped me resolve a bug that was causing my code to run in 5 minutes instead of 3 seconds on my docxtemplater library. This article is about the tools I used / created to find the source of the issue.
The issue was created on github . It explained that with some inputs with many elements in an array, the generation became very slow (taking more than 5 minutes for the generation of one document). The creator of the issue sent me a document to reproduce it. In the same time, I tried to use `node –prof`, but it didn’t give me more info than the fact that « Lazycompile » was taking more than 99% of the time. That is why I decided to create a speed test to find out how the speed of the library depended on the size of the input.
This is the data that the program I created gave (n is the length of the array) before optimization :
This is the graph of the data :
The graph doesn’t seem to be linear, that’s why we have to do a linear regression. We use log-scale to find out the order of our curve.
We find out that
ln(time) = 2*ln(n) - 4.5 . This gives us an order of 2, so this means that our algorithm is
O(n^2). However, since our algorithm should be looping only once over the data, our algorithm should be O(n).
It wasn’t easy to find where the bottleneck was. I searched for tools for profiling node.js programs, and finally found a pull request on the code coverage tool istanbul to create a new « profile » function : Pull request to add profiling command to istanbul
I didn’t like the type of output of the code, so I modified the code to output data as json in a file, so that I could analyse it at the end.
I then used a small jq script to analyse the json file :
jqq='. | to_entries | sort_by(.value.ms) | ..value | .filename + "@" + .name + ":" + (.calls|tostring) + ":" + (.ms|tostring) + "ms"'
jq sed « s:$(pwd)/prof/::g »
This gives us the function calls, ordered by the amout of time taken by each function. The result is the following :
We can see that we are going into
DocUtils.clone more than 4 million times, for about 80 seconds. We are actually cloning the array of data too many times. The clone function is itself recursive, that is why there are so many function calls. If we replace the implementation of clone with
JSON.parse(JSON.stringify(a)), we already get a huge speed gain. I ended up removing the clone totally, since it wasn’t really useful for the cost of it.
After optimizing, we see the following data (post optimization) :
The linear regression for this graph shows :
ln(time) = ln(n) -1.8
This validates that the order is indeed 1, which is what we wanted.
Here is a graph showing the differences, the first graph before the speed improvement, the second graph after the speed improvement. We can see that the difference is so huge that the new version seems completely flat in comparison with the old version.
I’m really happy that now, docxtemplater is much faster, and I learnt something new about finding bottlenecks : if the profiling tools don’t fit your needs, something very simple can do the job !