Reputation: 327
I'm trying to plot (GNUPlot) some covid-19 data contained in a CSV file which uses the first row as the time data and corresponding case counts in each column. I'd like to make a single plot for each state (each row) but not having much luck. Any help? This is what my plot script is so far. I'm using plot for [col=5:30:1]...
in the script because the first 4 columns are the state name and geolocation. I thought I'd just concentrate on the datapoints for now and eventually figure out how to display the state name on the plot as well. I've grep'd the USA data out of the main CSV data to create "us.dat":
set key autotitle columnhead
set term png size 1024, 768
set key outside
set datafile separator ","
set title 'mygraph'
set ylabel 'count'
set xlabel 'time'
set grid
set term png
set output "/tmp/covid19.png"
plot for [col=5:30:1] "us.dat" using col
And a snip of the "us.dat" file:
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,2/24/20,2/25/20,2/26/20,2/27/20,2/28/20,2/29/20,3/1/20,3/2/20,3/3/20,3/4/20,3/5/20,3/6/20,3/7/20,3/8/20,3/9/20,3/10/20,3/11/20
Washington,US,47.4009,-121.4905,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,267,366
New York,US,42.1657,-74.9481,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,173,220
California,US,36.1162,-119.6816,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,144,177
Massachusetts,US,42.2302,-71.5301,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,92,95
The plot image isn't quite right however:
Upvotes: 4
Views: 433
Reputation: 3734
Here is a pure gnuplot version
$data <<EOD
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,2/24/20,2/25/20,2/26/20,2/27/20,2/28/20,2/29/20,3/1/20,3/2/20,3/3/20,3/4/20,3/5/20,3/6/20,3/7/20,3/8/20,3/9/20,3/10/20,3/11/20
Washington,US,47.4009,-121.4905,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,267,366
New York,US,42.1657,-74.9481,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,173,220
California,US,36.1162,-119.6816,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,144,177
Massachusetts,US,42.2302,-71.5301,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,92,95
EOD
N = 50
array X[N]
array Y[N]
set datafile separator ","
# a dummy plot to extract the row into an array
pl $data us ($0==0? sum[i=1:N](X[i]=strcol(i+4), 0) :\
(strcol(1) eq "Washington")? sum[i=1:N](Y[i]=column(i+4)) : $0, $0) : 0
set xdata time
set timefmt "%m/%d/%y"
plot X us (X[$1]):(Y[$1]) w lp pt 7
Explanation:
First, there is a dummy plot. When the first row is entered ($0==0
), there is loop over all column to store the dates into array X
.
Similar, all columns are stored into array Y
, when column Washington
is entered.
The number of columns and their offset should be known in advance.
The sum
function is only (mis)used as loop. Since the date row contains string, the , 0
is provided, since strings cannot be summed.
Upvotes: 3
Reputation: 939
A possible solution is to use awk. By using it you can transpose your file and use gnuplot normally (thanks also to this awsome answer: An efficient way to transpose a file in Bash) You can even do it inline inside gnuplot.
Washington can be plotted as followed.
set xdata time
set timefmt "%m/%d/%y"
pl "<awk -F, '{ for (i=5; i<=NF; i++) { a[NR,i] = $i} } NF>p { p = NF } END { for(j=5; j<=p; j++) {str=a[1,j];for(i=2; i<=NR; i++){str=str\" \"a[i,j];}print str}}' us.dat" using 1:2 w l title "Washington"
Column 3 will be New York, 4 California, 5 Massachusetts.
Upvotes: 2