Fabio A.
Fabio A.

Reputation: 2724

Using symlinks to store data

An application I am working on has the need to make a counter's value persist across multiple invocations, so that each time the application is started again the counter's value is read back and counting continues from there. The value should be stored in human-readable form, so that it could be easily inspected, should the need for it arise, and it should be updated atomically, so that a failure won't mess up the previous persisting value.

Using a plain old text file seemed too boring, so after some creative thinking it occurred to me I could achieve the same goal by storing the counter as a symbolic link target.

Basically, using sh as a prototyping language, instead of doing

echo $counter > file.tmp && mv file.tmp file || rm -f file.tmp

I would do

ln -s $counter file.tmp && mv file.tmp file || rm -f file.tmp

The advantage of the latter approach is that I need only one syscall to write to the file, as opposed to the at least 3 in the former case.

As an added bonus, doing an ls -l from the shell automagically displays the file's content:

$ ls -l the.counter.is
lrwxrwxrwx 1 fabio fabio 4 mar  7 01:08 the.counter.is -> 1234

As for what concerns performances, executing a test program that compares the two approaches (see it here) on my PC I get results that match the expectations, with the symlink approach around 7 times faster than the standard approach (note that the test doesn't care about atomicity):

$ uname -a && ./linkfile 10000 4095 /tmp/test
Linux Fabio-Asus 4.8.0-40-generic #43-Ubuntu SMP Thu Feb 23 16:01:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Starting test... [10000, 4095]
writeToFile: 155.537ms
writeToLink: 23.4132ms

However, on coliru I get a different result, slightly in favour of the standard approach:

uname -a && g++ -O3 -o test main.cpp && sync && ./test 10000 4095 x
Linux stacked-crooked 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Starting test... [10000, 4095]
writeToFile: 21.8001us
writeToLink: 33.9217us

The test consists in 10000 iterations for each approach, writing 4095 bytes at each iteration and averaging their execution times.

The reason for the 4095 bytes is that more than those cause the symlink syscall to fail with ENAMETOOLONG.

So, the questions are:

  1. Has anybody, aside from the crazy-me, ever used this approach to store data before?
  2. If yes, for what use cases?
  3. Bearing in mind that my pc sports a i7-6500U CPU @ 2.50GHz, do you have any idea as of why on coliru the standard approach is so much faster than on my PC, both in relation to the symlink approach and in absolute time? If it's because of some caches, why wouldn't those have an effect on my PC too, and why would they not have a positive impact on the symlink approach?

Upvotes: 4

Views: 733

Answers (1)

codeforester
codeforester

Reputation: 43039

My answers:

  1. Yes, I have seen symlinks being used for storing data. As you have already explained, there is a huge performance gain storing small pieces of data in a symlink rather than a file. I believe the symlink value is stored directly on the inode which makes it more storage efficient. Another huge advantage is the atomicity - symlink creation is an atomic process and helps in handling concurrency issues.
  2. The value stored in symlinks is mostly metadata, application specific. For example, if I have to build a parser that incrementally parses a large number of dynamic log files, I might want to store the last byte position read in a symlink. Symlinks can also be used for implementing locks. I have seen cases where flock was not reliable on NFS and symlinks were used instead.
  3. I am not sure about this - probably there is an implementation difference?

Upvotes: 2

Related Questions