Reputation: 144
Hi I am trying to write a gnuplot script that produced CDF graph for the data produced from another program.
The data looks like this:
col1 col2 col3 col4 col5
ABCD11 19.8 1.13 129 2
AABC32 14.3 2.32 109 2
AACd12 19.1 0.21 103 2
I want to plot CDF for the column 2. The point is that data in the col2
might not be sorted.
To compile the script I use online tool such as here
The script I tried is:
set output 'out.svg'
set terminal svg size 600,300 enhanced fname 'arial' fsize 10 mousing butt solid
set xlabel "X"
set ylabel "CDF"
set style line 2 lc rgb 'black' lt 1 lw 1
set xtics format "" nomirror rotate by -10 font ", 7"
set ytics nomirror
set grid ytics
set key box height .4 width -1 box right
set nokey
set title "CDF of X"
a=0
#gnuplot 4.4+ functions are now defined as:
#func(variable1,variable2...)=(statement1,statement2,...,return value)
cumulative_sum(x)=(a=a+x,a)
plot "data.txt" using 1:(cumulative_sum($2)) with linespoints lt -1
Upvotes: 0
Views: 3149
Reputation: 4095
You can use the cumulative
smoothing style to get a CDF from data, see help smooth cumulative
:
plot "test.dat" u 2:(1) smooth cumulative w lp
Upvotes: 1
Reputation: 13087
If you want to calculate the (running) cumulative sum of the values from second column using sorted values, then you could slightly extend your approach based on awk
. To be more specific, the command would be
tail -n+2 'test.txt' | sort -k2,2n | awk '{s+=$2; print NR, s}'
Here, tail
strips off the header (skips the first line), sort
sorts numerically according to the second column, and finally awk
calculates the cumulative sum as a function of the number of records/items.
Upvotes: 0