Optimizing speed (In Node.JS)

This is an article about optimizing speed in Node.JS, since every article I found was about optimizing memory usage, finding out memory leaks, but none of the methods I found helped me resolve a bug that was causing my code to run in 5 minutes instead of 3 seconds on my docxtemplater library. This article is about the tools I used / created to find the source of the issue.

The issue was created on github . It explained that with some inputs with many elements in an array, the generation became very slow (taking more than 5 minutes for the generation of one document). The creator of the issue sent me a document to reproduce it. In the same time, I tried to use `node –prof`, but it didn’t give me more info than the fact that « Lazycompile » was taking more than 99% of the time. That is why I decided to create a speed test to find out how the speed of the library depended on the size of the input.

This is the data that the program I created gave (n is the length of the array) before optimization :


n     time(ms)

50    94
100   160
200   473
350   1407
500   2931
750   6918
1000  11106
1250  17345
1500  23982
1750  32912
2000  41297
2250  51642
2500  66122

This is the graph of the data :

raw_unoptimized

The graph doesn’t seem to be linear, that’s why we have to do a linear regression. We use log-scale to find out the order of our curve.

order_unoptimized

We find out that ln(time) = 2*ln(n) - 4.5 . This gives us an order of 2, so this means that our algorithm is O(n^2). However, since our algorithm should be looping only once over the data, our algorithm should be O(n).

It wasn’t easy to find where the bottleneck was. I searched for tools for profiling node.js programs, and finally found a pull request on the code coverage tool istanbul to create a new « profile » function : Pull request to add profiling command to istanbul

I didn’t like the type of output of the code, so I modified the code to output data as json in a file, so that I could analyse it at the end.

Here is my version of the profiler on github.

I then used a small jq script to analyse the json file :


jqq='. | to_entries | sort_by(.value.ms) | .[].value | .filename + "@" + .name + ":" + (.calls|tostring) + ":" + (.ms|tostring) + "ms"'

jq sed « s:$(pwd)/prof/::g »

This gives us the function calls, ordered by the amout of time taken by each function. The result is the following :


scopeManager.js@functorIfNotInverted:403:55462.439ms
scopeManager.js@loopOverValue:11:55463.694ms
scopeManager.js@loopOver:11:55464.606ms
xmlTemplater.js@dashLoop:10:55787.118ms
xmlTemplater.js@calcSubXmlTemplater:414:69817.831ms
xmlTemplater.js@forLoop:11:69825.111ms
xmlTemplater.js@replaceLoopTag:11:69855.047ms
xmlTemplater.js@loopClose:1754:70233.641ms
xmlTemplater.js@handleCharacter:24270:70334.91ms
xmlTemplater.js@forEachCharacter:416:70490.983ms
xmlTemplater.js@compile:416:70513.029ms
xmlTemplater.js@render:416:70514.29ms
docUtils.js@DocUtils.clone:4086081:82243.956ms

We can see that we are going into DocUtils.clone more than 4 million times, for about 80 seconds. We are actually cloning the array of data too many times. The clone function is itself recursive, that is why there are so many function calls. If we replace the implementation of clone with JSON.parse(JSON.stringify(a)), we already get a huge speed gain. I ended up removing the clone totally, since it wasn’t really useful for the cost of it.

After optimizing, we see the following data (post optimization) :


n     time(ms)
50    57
100   49
200   56
350   92
500   105
750   130
1000  178
1250  222
1500  257
1750  287
2000  326
2250  384
2500  430

raw_optimized

The linear regression for this graph shows :

order_optimized

ln(time) = ln(n) -1.8

This validates that the order is indeed 1, which is what we wanted.

Here is a graph showing the differences, the first graph before the speed improvement, the second graph after the speed improvement. We can see that the difference is so huge that the new version seems completely flat in comparison with the old version.

comparison

I’m really happy that now, docxtemplater is much faster, and I learnt something new about finding bottlenecks : if the profiling tools don’t fit your needs, something very simple can do the job !

Bisecting your .vimrc

Bisecting your .vimrc

When you have a problem with vim, but don’t know exactly where it comes from, many people recommend to bisect your .vimrc. Here is a small tutorial on how to do it.

The following gist contains my .vimrc that has a bug. The bug is that it is not possible anymore to mix d with /. For those that aren’t used to much vim, here is first an explanation of what d/ does

Gist to the .vimrc

What is d/ ?

In vim, you can compose actions (often also called verbs), and motions (often called nouns). For example, they is the action delete which has the letter d, and the motion $ which means go to the end of the line. Whenever you hit d$, it will delete text until the end of the line.

An other motion to move around is /, which is the search motion. You first enter /, than the text you want to search for, and press enter to go there. You can normally mix d (the delete action), and / (search motion), to delete text until the first occurrence of a text pattern.

For example, with the following file :

My dog is called Johny.

If your cursor is on the « M », and you press d/called<Enter> , the ending text will be :

called Johny.

Resolving the issue

Here is how I usually solve problems like that: I start opening two bash windows (I personally like to use tmux, which is a terminal multiplexer).

  • The first window will be to edit my .vimrc,
  • The second window to test whether vim behaves correctly with my .vimrc. In my case, I created a file test.txt with the following content « My dog is called Johny. » and run vim test.txt

Then I try to find a way to reproduce the issue. In my case, I hit : d/called<Enter>. I call this the run step

  • If no text is deleted, the bug is still present
  • If text is deleted, the bug is not there anymore.

In either case, I then go to the first window to edit my .vimrc. I then try to remove some code in my .vimrc, I usually start with plugins because they are most of the time the culprit. Here I would delete all code from Bundles up to the end of Bundle configuration.

In my case, if I then do the run step again, the problem disappears, so we have some reason to believe that the problem comes from one of the plugins. What I do next is to go back to the .vimrc file, and undo the delete. I then only delete half of the plugins that I have. Everytime the bug is still present, I will remove the remaining half of the plugins. If the bug disappears, I will undo my change and remove the other half of the plugins. I do this until I have only one Plugin enabled, which is then the plugin causing the issue.

I then get to a very minimal .vimrc which still has the bug

set nocompatible

" set the runtime path to include plug.vim
if has('vim_starting')
  " Required:
  source ~/.vim/autoload/plug.vim
endif

call plug#begin()
Plug 'vim-scripts/SearchComplete'
call plug#end()

You can then report the issue to the plugin creator (Or decide not to use the plugin anymore)

Make your tests run in less than a second

What I did

For the open-source project docxtemplater I work on, my whole test suite runs in less than 500ms (280ms on my machine, 345ms on Travis). It contains more than 1000 assertions in total, about 50 describe statement. This permits to get very fast feedback about what you change, and is extremely pleasant to work with. On my computer, I have created a command that watches every file save made from vim and triggers the tests automatically, turning a usb-light green or red depending on the result of the tests.

I have written a small bash script called swatch with which you can do swatch npm test which means « watch vim file saves and run the command specified on the right every time that event happens ». Source code of swatch

Having a very fast test suite allows to integrate your test checks inside your workflow, without having to run your tests manually once you think you have finished a feature. When your code is complex, it is a great way to refactor without having to wait and see if your changes broke something.

How to speed up your test suite

You can do a few things to improve the speed of your test suites :

First things first, if your tests need a lot of time, maybe your code is just slow : a HTTP request (including data-transfer and business logic) should take less than 500ms. If it is slower, you should be more concerned by the speed of your application.

If you test only the business logic, your code should be taking less than 50ms to run with normal data.

Here is a small list of what you should avoid in a test suite :

  • Reads/Writes on the filesystem
  • Interactions with a database
  • HTTP requests

I’m not a huge fan of mocking because maintaining them wastes a lot of developer time. Try to minimize the number of mocks you decide to write, by first analysing where the biggest bottlenecks are (this will depend on the system under test) and mock only those.

I’m not saying that you should use mocks in every program. For example, in my docxtemplater project I don’t mock anything and still do a few reads on the filesystem. If I decided to mock the calls to the filesystem, I know it could even more improve the speed of my test suite. However I think it isn’t worth the effort because the speed is already very satisfying.

Don’t test your whole test suite everytime

Many testing frameworks have an option to filter tests, so that you can choose which tests you want to run. If you are working only on a part of an application, you can narrow the tests you run to get faster feedback.

For example, with mocha, you can write your test by using the .only property:

describe.only("This is the only test that will run", function(){
    // TESTCODE
})

I’d be happy to help you improve the speed of your tests

If you maintain an open-source project and would like to speed up your test suite, I’d like to help you for free. (just contact me at contact{AT}{this-domain-name})

Git good practices

Here are a few git tips that I find essential :

1. Use git on all your projects, even your personal projects

When you work on a project on your own, you might think you won’t need git because you don’t need to share your code. However, git is not just a way to share your code but rather to retain history. So that means that by using git, you are going to be able to look up older versions of your code. For example, if you find a bug that you didn’t have before, you can use git bisect to find out which commit introduced the bug. This makes debugging much easier. You will also be able to develop many features separately, and merge them back into your master branch whenever they are finished. Most importantly, if you do changes that you think will lead to nowhere, you can reset your code to the latest commit with git reset HEAD --hard, or you could also save those changes for the future by creating a stash with git stash save "stash message" . If you don’t have good memory, you will also be able to see if and when you developed a new feature.

2. Rewrite your history, because a clean history makes things simpler

I very often do interactive rebases, with the git command git rebase -i <commit-id>. It gives you the possibility to change the commits after . Be careful not to change published history. What I use most of the time:

  • fixup can meld two commits into one,
  • reword lets you change the commit message of a commit, when you made a typo in your commit message, or just want to add a summary to that commit message.

With clean history, you will be able to understand the project history much better : what features where developed, by whom, and what parts of the code were changed to achieve that new feature. Lots of people are impressed to see that I don’t need to ask someone how he added a specific feature, the secret is just to look at project history.

3. Have strict convention for your commit messages

I personally like to have my commits written in english, in active present form, eg:

BAD: added XYZ feature GOOD: Add XYZ feature

Also, I recommend to follow advice from tpope’s blog post about commit messages.

4. Don’t use git commit -m « message »

I think it is bad practice to use the -m option of git commit, because it doesn’t enforce you to take time to think about your commit message. It’s also a few letters less to press to start (no ‘-m «  »‘). Just start using your editor to write those commit messages, you will see that the quality of those messages will increase significantly.

5. Don’t add unrelated stuff to a commit message

If you find a typo somewhere whenever you’re coding a new feature and want to commit it, don’t put it in the same commit than your feature. The problem if you do so is that it is no more possible to tell when you fixed the typo from the commit messages. Also if you git revert that commit because you want to remove that feature, you will also revert the correction of the typo fix, which you might not even see.

6. Use git add –patch or git add –interactive

When using git add file, or even git add . , you lose a lot of control about what you put inside your commit and what you don’t. By using git add <file> --patch, you will be able to select which lines you would like to put into your staging area and which line you want to keep out of it.

7. It’s ok to work however you want, as long as the published history is clear

I personnaly use tig, which is a text-based interface for git that runs inside the command line. They are many GUIS that can manage your git repository, for example SourceTree. Be aware that knowing the command line api of git is very useful for all advanced tasks. Also, it will work even if you don’t have a display (such as when you ssh).

Riot overview : the new React like JS micro-Framework

Over the weekend, I tried to play around with Riot. Riot is very sharply focused on doing one thing right. The library is very small: 3.5kb minified. One file is responsible for one UI element, for example a todolist. That file contains both the HTML and JS responsible for that component. A UI element is represented in your application with a custom tag, for example « . You’re responsible for loading data for that view, and anything else that is not directly bound to the User interface. In that way you could very easily replace only the Storing engine for one particular UI element for example.

The syntax is very easy to learn and still very expressive. For example to have a checked attribute set when one of your variable is truthy, you could write the following code:

to have a checkbox that is checked by default.

The JS you use inside your templates can be written in ES6, the code will automatically be transpiled into working ES5 JS.

So for example, you could write:

You also have support for loops, transclusion, including scoped CSS (that will only be used in your custom tags).

The design guide is to use an event based architecture: You should load data, and have your business logic separated from your UI, and communicate changes from the Store to the UI. The UI will then call events to a global Dispatcher (which can be done with the 10 lines library RiotControl), The Dispatcher then dispatches events to the stores, that may send event to the UI as a result. With this, you have implemented a Flux architecture .

The Documentation of Riot 2.0 is very well written, easy to grasp and completely uptodate.

Riot is a UI-framework you should definitively take into consideration

How to run io.js in Travis CI

UPDATE: Travis CI now supports io.js

It’s now much easier to use iojs on travis: Just write the following:

Base Article

Here’s just a small code snippet to show how you can run your tests with io.js in Travis CI.

At the current time, Travis CI doesn’t support io.js like it does with other options. So you will have to use the before_script argument in your travis.yml

Here’s how the .travis.yml looks like

of course, you could also do this for mocha, and any other cli tool.

Hope this helps

Interviewers: stop the puzzle games !

I was active in looking for a job in a software company in the past three months (although I also did some contract work), and I failed at a puzzle game test because « I was too slow to answer the questions ».

Here are a some of the questions I was asked to do:

You are given a string variable $str.
Write PHP code to print the given string with every third letter removed.
For example, if input string is « Apple iPhone », the output should be « ApleiPon ».

 

Array $a contains integer numbers. Write PHP code to print an index of an element that contains the maximum value among array values that are even numbers (can be divided evenly by 2).

 

The questions and quizzes I’m going to talk about are the ‘basic’ quizzes, where your task is very near to algorithmic: eg write a function that sorts an array,
write a function that finds an integer inside an array, etc

tl;dr

They are many reasons for which I think this kind of questions do not make sense to choose between candidates.

  1. Some people will cheat
  2. It doesn’t show how you organize your code
  3. It doesn’t show how you work with others

Some people will cheat

In the book « The honest truth about dishonesty » by Dan Ariely, he explains that the most effect of cheating doesn’t come from big cheaters but rather from many cheaters.
In other words, they are not a few people who are tempted to put lots of effort to cheat.
However, they are many people who would put some effort to increase their score artificialy.
For example that could be done by using one or many of the following tricks (ordered from the difficult to the easiest):

  • If the test doesn’t require you to login, you could do the test twice, the first time providing information false information. This way, you can prepare
    yourself to the questions that are asked and achieve a score that is higher than what you could do normally
  • If you have unlimited time for the test (eg have one week to complete it), than you could very easily ask the help of a friend or even ask you question on
    a forum to make you smarter.

It doesn’t show how you organize your code

One of the biggest problem with quiz like problems is that you don’t encounter them in real life. You already have functions that can sort arrays in most of the
languages, or you could easily find a library that takes care of this for you.

The real difficulty in a codebase is not how to sort an array, or if that is the case your programmers are probably not that good. The real difficulty in a
codebase is how to organize your code.

I would also like to talk about a very positive experience when looking for a job : it was the test of the Financial Times Lab. They ask you a set of questions
which are « Stackoverflows » style of question, eg they ask you what’s the difference between

helloWorld()
function helloWorld() {
console.log(1);
}

and

helloWorld()
var helloWorld=function () {
console.log(1);
}

and why they give different results.

It doesn’t show how you work with others

Working with others is a huge part of what a programmer does. To achieve a task, some work must be done. But what takes most effort is the communication between
members of the team. If the communication doesn’t work well, your project isn’t going to meet it’s deadline.

Work in progress

Your support tickets are just the tip of the iceberg

Iceberg
The tip of the iceberg

If you run a product and have some kind of support system, you’re for sure, depending on the number of customers, managing a few support tickets per month. You might think « why are my users so stupid ? » , but this is not even the tip of the iceberg. Think about it from the user perspective. How many times are you annoyed by something a day ? Probably somewhere between five and ten times a day. And how often do you take the time to share your problem to the app’s creator? Probably once a day of you’re quite a world improver.

They are so many broken things that most of what is broken will not reach to you because other things are much more annoying.

By removing the most painful problems first, you will be able to get some feedback from the less urgent things.

Vim serving as a model for keyboard-based software

vim is a good example (if not the best) of a keyboard based application. Vim being alive and still maintained during more than 20 years (40 if you count its predecessor ex with its visual mode vi), it shows that such an application is robust and doesn’t get caught up by the new sleeky editors.

Vim’s powerfulness is due to its inner coherence. The letter `d` always has the signification: delete. The letter `w` means a word, so `dw` means delete the next work. `$` means the end of the line, so `d$` means delete to the end of the line. An abbreviation for `x$` where x could be any character is `X`. So `D` will delete all characters to the end of the line. In the same way `C` (change) will delete all characters to the end of the line and put you in insert mode (The mode that lets you use Vim like any other editor).

I’m trying to do the same with my own software. `j` and `k` are used to go up and down a list. What is very great about keyboard shortcuts is the following:

* They never take space on screen, delivering you the best experience
* They will go into your muscle memory, and you won’t need to look at the keyboard to go the next item in the list (just press the key with the little mark on it 🙂 )
* They make it possible to add functionality without loosing clarity: you don’t need to learn the keyboard shortcuts, but if you want to go one level up, you can.

Features : Value vs Cost

When you create a product, you’re probably discussing a lot about features. Here are a few possible questions :

Where should we put that button ?

Who should get access to that feature ?

How should we introduce that new feature (blog post, tutorials, …) ?

How should we implement that feature ?

There are probably much more questions you can ask yourself when you think you’re going to implement a feature.

However, one question that is essential and often not asked is the following one :

Should we add this new feature at all?

I think most of  the time, the answer should be No.

This is because how costly features are. I have tried to set up a few questions that you can ask yourself the next time you’re thinking about a new feature, to help you balance the value/costs of a feature.

Value

  • Will your customers use that feature often, or is it very useful even if it is not used very often ?
  • Does your software sell better if the feature is present ?
  • Will the feature benefit to many customers ?
  • Are you the only one to provide that value ?

Costs

  • How long will it take for you to develop that feature ?
  • Are there bugs in your software that you will take longer to fix because you will be busy working on the new feature ?
  • Are there pending support tickets that will still be pending because of the new feature development ?
  • Will the feature you’re building increase complexity ?
  • Is it easy to drop that feature if it doesn’t fit to the customers needs ?
  • What will be the cost of maintaining that feature ?
  • Is the new feature likely to add a security leak ?

Hope that helps !