Andy_Jake
Andy_Jake

Reputation: 144

Plot CDF with GNUPLOT with data not being sorted

Hi I am trying to write a gnuplot script that produced CDF graph for the data produced from another program.

The data looks like this:

col1    col2    col3    col4    col5
ABCD11  19.8    1.13    129 2
AABC32  14.3    2.32    109 2
AACd12  19.1    0.21    103 2

I want to plot CDF for the column 2. The point is that data in the col2 might not be sorted.

To compile the script I use online tool such as here

The script I tried is:

set output 'out.svg'
set terminal svg size 600,300 enhanced fname 'arial' fsize 10 mousing butt solid
set xlabel "X"
set ylabel "CDF"
set style line 2 lc rgb 'black' lt 1 lw 1
set xtics format "" nomirror rotate by -10 font ", 7"
set ytics nomirror

set grid ytics
set key box height .4 width -1 box right
set nokey
set title "CDF of X"

a=0
#gnuplot 4.4+ functions are now defined as:  
#func(variable1,variable2...)=(statement1,statement2,...,return value)
cumulative_sum(x)=(a=a+x,a)
plot "data.txt" using 1:(cumulative_sum($2)) with linespoints lt -1

Upvotes: 0

Views: 3149

Answers (2)

user8153
user8153

Reputation: 4095

You can use the cumulative smoothing style to get a CDF from data, see help smooth cumulative:

plot "test.dat" u 2:(1) smooth cumulative w lp

enter image description here

Upvotes: 1

ewcz
ewcz

Reputation: 13087

If you want to calculate the (running) cumulative sum of the values from second column using sorted values, then you could slightly extend your approach based on awk. To be more specific, the command would be

tail -n+2 'test.txt' | sort -k2,2n | awk '{s+=$2; print NR, s}'

Here, tail strips off the header (skips the first line), sort sorts numerically according to the second column, and finally awk calculates the cumulative sum as a function of the number of records/items.

Upvotes: 0

Related Questions