Genelia D'souza
Genelia D'souza

Reputation: 125

Performance analysis of a functionality with different approaches Linux

I know this question is more of compiler & OS dependent stuff but if anyone can throw some light on it, it can help me to do some optimization.

My goal is to create a file X in a folder y

(which could be millions in number, also x and y are variants and changes for every call.) Am working on Linux.

To accomplish this I have two ways:

First do a chdir to required directory 'y' and then create the file 'x'.

C code:

char *dir = "/root/"; 
FILE *fd;
chdir(dir);
fd = fopen("geneliatestingN","a+");
fprintf(fd,"ansh");
fclose(fd);

Strace:

1329039557.874631 chdir("/root/")       = 0
1329039557.874704 brk(0)                = 0x9ad6000
1329039557.874726 brk(0x9af7000)        = 0x9af7000
1329039557.874757 open("geneliatestingN", O_RDWR|O_CREAT|O_APPEND, 0666) = 3
1329039557.874817 fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
1329039557.874869 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fcb000
1329039557.874899 write(3, "ansh", 4)   = 4
1329039557.874940 close(3)              = 0

Second ways is Just provide the absolute path of the file and create it.

C Code:

sprintf(filepath, "%s/geneliatestingS",dir);
fd = fopen(filepath,"a+");
fprintf(fd,"ansh Testing again");
fclose(fd);

Strace:

1329039557.875000 open("/root//geneliatestingS", O_RDWR|O_CREAT|O_APPEND, 0666) = 3
1329039557.875046 fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
1329039557.875096 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fcb000
1329039557.875123 write(3, "ansh Testing again", 18) = 18
1329039557.875160 close(3)              = 0

So what could be the better way to accomplish this functionality basically which of these two will consume less instruction cycles, more efficient both in CPU and execution time.

Upvotes: 2

Views: 157

Answers (2)

kfmfe04
kfmfe04

Reputation: 15327

You will be I/O bound, not CPU bound.

You may be optimizing at the wrong level - whatever you do with this design, your CPUs will be waiting around while your drives will be grinding.

Off the top of my head, I would definitely look into:

  • How long your HDD head seeks take. I'd bet that by sorting as many of those directories before writing to disk, in an attempt to minimize head seeks, that you could optimize better than looking to cut down CPU cycles.
  • Totally redesign your system from the top: consider other models besides writing 10 million directories. Could you write a smaller number of files and perhaps use mmap() instead? Note that the issue isn't just writing this data to disk - your design choice could drastically affect how quickly you can read this data back into memory when users want access. For example, if you have ten users who want files from all different parts of the filesystem, your HDD will be your bottleneck.
  • Depending on your application, Nosql databases may work better for you.

Upvotes: 3

The only way to get a good answer is to try on your particular system and measure.

However, I would believe that minimizing the number of system calls should be better.

Actually, you are perhaps considering writing a big lot of files in the same directory. Recent file systems have indexed directories, but some older filesystems don't have them (so with such old file systems, file creation or seeking operations are linear in the number of entries in each directory).

An old trick, when considering writing many thousands of files in some directory /foo was to create a few dozen subdirectories like /foo/1/ /foo/2/ and populate each such subdirectory with only a few hundreds entries.

Another reason to do so is because an interactive shell (with file completion) is not very happy in directories containing tens of thousands entries.

As always, your mileage may vary.

If you want to have many thousands of small files, you could consider other solutions, like a database (.e.g. with a MySQL or PostGresQL client library and server) or a GDBM indexed file

Upvotes: 2

Related Questions