Bari Tala
Bari Tala

Reputation: 57

gnuplot plot histogram on strings

I have one single column of data formatted as so

Chicago
Chicago
New York
Chicago
Boulder
Boulder
Chicago
Los Angeles
San Diego
Chicago

I'm trying to plot the counts for each city in the column using gnuplot. Any ideas?

Upvotes: 0

Views: 263

Answers (2)

theozh
theozh

Reputation: 25734

This is almost a duplicate of gnuplot , non-numeric repeated x values. The difference is that you don't have a second column with numbers, but you want to sum up the occurrences of the items.

Just for the records, you can also implement it with gnuplot-only. For example under Windows, you will not have sort and uniq at hand by default (unless you have installed CoreUtils for Windows). However, since gnuplot has no internal feature of sorting strings the order will be in the order of first occurrence.

Since you haven't enclosed the town names "New York", "Los Angeles" and "San Diego" in quotes, gnuplot will consider this by default as two columns. You can tell gnuplot to consider this as one column if you set datafile separator "\n" or any character which doesn't occur in your data.

Data: SO46800152.dat

Chicago
Chicago
New York
Chicago
Boulder
Boulder
Chicago
Los Angeles
San Diego
Chicago

Script: (working for at least gnuplot>=4.4.0, March 2010)

You might need to adjust the set offset differently for gnuplot 4.x and 5.x versions.

### plot histogram on strings
reset

FILE = "SO46800152.dat"

set datafile separator "|"
list = ''
c = 0
index(list,s) = (_n=strstrt(list,s)) ? int(word(list[_n+strlen(s):],1)) : 0
myX(col)      = (_s='"'.strcol(col).'"', strstrt(list, _s) ? '' : list=list.sprintf('%s %d ',_s,c=c+1), index(list,_s))

set offset 0,0.5,0.5,0
set yrange[0:]
set xrange [0.5:]
set boxwidth 0.8
set style fill solid 0.4

plot FILE u(myX(1)):(1):xtic(1) smooth freq w boxes lc rgb "red" notitle
### end of script

Result:

enter image description here

Upvotes: 0

Christoph
Christoph

Reputation: 48390

Gnuplot cannot do this. You could use sort and uniq command line tools to preprocess your data:

set boxwidth 0.7
set yrange [0:*]
set style fill solid noborder
plot "< sort 'file.dat' | uniq -c" u 0:1:xtic(2) with boxes

Upvotes: 4

Related Questions