Reputation: 313
I have data set (filename 'data') like this:
a 10.1
b 10.1
c 10.2
b 15.56
a 3.20
and I would like to plot this data as points. When I try:
plot 'data' using 2:xticlabels(1)
I get plot with 5 x-axis values a,b,c,b,a but I wish to get only 3 (a,b,c (the order is not important)) on plot with all 5 y values. Is it possible?
My real data file looks like this:
2-8-16-17-18 962.623408
2-3-4-5-6 -97.527840
2-8-9-10-11 962.623408
2-8-9-10-11 937.101308
2-3-4-5-6 37.101308
and has about thousand records.
I don't know how to use mgilson's code, but he give me an idea. I add to data file additional column (index):
1 a 10.1
2 b 10.1
3 c 10.2
2 b 15.56
1 a 3.20
after which ploting in gnuplot is easy:
plot 'data' u 1:3
I use perl, so my script lookls like this:
#!/usr/bin/perl
$index_number = 0;
while (<>)
{
$line = $_;
@columns = split(" ",$line);
$col1 = $columns[0];
$col2 = $columns[1];
if( not exists $non_numeric{$col1} )
{
$index_number++;
$non_numeric{$col1} = $index_number;
}
print "".$non_numeric{$col1}."\t".$col1."\t".$col2."\n";
}
Upvotes: 4
Views: 1258
Reputation: 25704
Just for the records, there is a not too complicated gnuplot-only solution!
As the OP already wrote, it cannot simply be done by plot FILE u 2:xtic(1)
.
The solution is a variation and mix of Christoph's answer to Gnuplot, plotting a graph with text on y axis and my answer to How do I group strings and their data using Gnuplot?
How it works:
myX(col)
: while plotting the data row by row: if the string (of your x-column) is not yet found in the string variable list
, append it surrounded by quotes, increase the counter c
by 1 and add the value of c
to the list as well, and return the index of the current stringAt the end, the value of the string list
in the example below will be:
"a" 1 "b" 2 "c" 3
index(list,s)
will return the index of s
in list
by matching the substring s
(check help strstrt
) and extracting the succeeding number.Data: SO12123578.dat
a 10.1
b 10.1
c 10.2
b 15.56
a 3.20
Script: (works at least with gnuplot>=4.4.0, March 2010)
### use string values as x-values
reset
FILE = "SO12123578.dat"
list = ''
c = 0
index(list,s) = (_n=strstrt(list,s)) ? int(word(list[_n+strlen(s):],1)) : 0
myX(col) = (_s='"'.strcol(col).'"', strstrt(list, _s) ? '' : list=list.sprintf('%s %d ',_s,c=c+1), index(list,_s))
set offset 1,1,1,1
plot FILE u(myX(1)):2:xtic(1) w p pt 7 lc rgb "red" notitle
### end of script
Result:
Upvotes: 1
Reputation: 309831
I doubt that you can come up with a gnuplot only solution. However, this should work as long as you have python2.5 or newer installed on your system. (It works with your test data).
import sys
import collections
data = collections.defaultdict(list)
keys = []
# build a mapping which maps values to xticlabels (hereafter "keys")
# Keep a second keys list so we can figure out the order we put things into
# the mapping (dict)
with open(sys.argv[1]) as f:
for line in f:
key,value = line.split()
data[key.strip()].append( value )
keys.append(key.strip())
def unique(seq):
"""
Simple function to make a sequence unique while preserving order.
Returns a list
"""
seen = set()
seen_add = seen.add
return [ x for x in seq if x not in seen and not seen_add(x) ]
keys = unique(keys) #make keys unique
#write the keys alongside 1 element from the corresponding list.
for k in keys:
sys.stdout.write( '%s %s\n' % (k, data[k].pop()) )
# Two blank lines tells gnuplot the following is another dataset
sys.stdout.write('\n\n')
# Write the remaining data lists in order assigning x-values
# for each list (starting at 0 and incrementing every time we get
# a new key)
for i,k in enumerate(keys):
v = data[k]
for item in v:
sys.stdout.write( '%d %s\n' % (i, item) )
Now the script to plot this:
set style line 1 lt 1 pt 1
plot '<python pythonscript.py data' i 0 u 2:xticlabels(1) ls 1,\
'' i 1 u 1:2 ls 1 notitle
Here's how this works. When you do something like plot ... u 2:xticlabels(1)
, gnuplot implicitly assigns sequential integer x-values to the data points (starting at 0). The python script re-arranges the data to make use of this fact. Basically, I create a mapping which maps the "keys" in the first column to a list of elements that correspond to that key. In other words, in your dummy datafile, the key 'a'
maps to the list of values [10.1, 3.2]
. However, python dictionaries (mappings) aren't ordered. So I keep a second list which maintains the order (so that you axes are labelled as 'a', 'b', 'c' instead of 'c','a','b' for instance). I make sure that the axes list is unique so that I can use it to print the necessary data. I write the data in 2 passes. The first pass prints only one value from each list along with the mapping "key". The second pass prints the rest of the values along with the x-value that gnuplot will implicitly assign to them. Between the two datasets, I insert 2 blank lines so that gnuplot can sort out the difference using the index
keyword (here abbreviated to i
). Now we just need to plot the two datasets accordingly. First we set a linestyle so that both passes will have the same style when plotted. Then we plot index 0 (the first dataset) with the xticlabels and index 1 using the x-value,y-value pairs the python script calculated (u 1:2
). Sorry the explanation is long (and that the original version was slightly buggy). Good luck and happy gnuplotting!
Upvotes: 1