Nister
Nister

Reputation: 226

gnuplot slow when plotting large data set as animation

I'm trying to make an "animated" plot a lot of data (the position of 1000 particles) from a big text file with a script like:

set terminal wxt size 1000,600
k=999999
N = 999
do for [i=0:k]{
plot for [j=0:N-1] "pos.txt" using 2*j+1:2*j+2  every ::2*i+1::2*i+1 ls 1 pt 7 ps 2 notitle

In the file, every line has the coordinates X and Y at a certain time of the points I want to plot. I'm using every to plot all the data in each row once and then move on to the next row.

The output is something like this (1000 particles moving) enter image description here

However the plotting is way too slow and I don't know what I can do to make it plot faster. It plots a row once every 5 or more seconds. The file weights some MBs. Should I change the terminal? Or the way I store the data? I think there might be a problem when gnuplot loads a big file.
Some particles dissappear in the simulation so I also get the error line 14: warning: Skipping data file with no valid points when the index j (well 2j+1) goes over the number of particles but I tried making it so that it reads the number of particles each time and it's even slower. Many thanks.

Upvotes: 6

Views: 5677

Answers (2)

Christoph
Christoph

Reputation: 48430

If performance is very critical, you may consider using a completely different data format. Although changing the format of the ASCII file gives a huge improvement, it scales badly, because gnuplot must always scan from the beginning of the data file in order to determine the position where to start at. I did some testing, and to plot the first 1000 frames it took me 60s, whereas the points 9000 to 10000 took 600s to plot.

You would need a data format which allows you to seek at any data set in constant time. In my thesis I saved all my experimental data (huge data sets) with hdf5, and then you can use the external utility h5totxt to extract the desired data set. Here, the position of the requested data set can be calculated without scanning the whole file, and the access time is independent of the frame number.

For testing I used the following python script to generate a test data file points.h5:

from numpy import random
import h5py
P = random.normal(size=(10000,1000,2))
f = h5py.File('points.h5', 'w')
f.create_dataset('points', data=P)

The gnuplot script for plotting is

set terminal wxt size 1000,600
k=9999
do for [i=0:9999]{
  plot sprintf("< h5totxt -s ' ' -x %d points.h5", i) using 1:2 ls 1 pt 7 ps 2 title sprintf("%d", i)
}

Now, plotting of 1000 frames takes 40s, no matter which frames you take (0-1000 or 9000-10000).

Upvotes: 3

Miguel
Miguel

Reputation: 7627

I suspect gnuplot is reading the whole file every time you plot, as opposite to read up to the line in question, then next line, then next, etc. One possible strategy is to separate your particles trajectory into different files, but specially it could help to remove the plot for by simply a plot plus a block selection with every, where instead of selecting the column for the particle you have your particles positions for the same time step in the same block.

Now your data looks something like this:

x1 y1 x2 y2 x3 y3 # Time step 1
x1 y1 x2 y2 x3 y3 # Time step 2

And gnuplot needs to read the file once for every time step and particle. If you structure the file as follows (note one blank line between blocks):

# Time step 1
x1 y1
x2 y2
x3 y3

# Time step 2
x1 y1
x2 y2
x3 y3

Then you don't need the plot for, instead just select the corresponding block with all the particles by inserting one extra semicolon in every:

set terminal wxt size 1000,600
k=999999
#N = 999 you don't need this anymore!
do for [i=0:k] {
plot "pos.txt" every :::i::i
}

The code above reads the file for every time step, rather than every time step and particle, and plots all the particles at once.

Upvotes: 3

Related Questions