kato sheen
kato sheen

Reputation: 313

gnuplot , non-numeric repeated x values

I have data set (filename 'data') like this:

a 10.1
b 10.1
c 10.2
b 15.56
a 3.20

and I would like to plot this data as points. When I try:

plot 'data' using 2:xticlabels(1)

I get plot with 5 x-axis values a,b,c,b,a but I wish to get only 3 (a,b,c (the order is not important)) on plot with all 5 y values. Is it possible?

My real data file looks like this:

2-8-16-17-18   962.623408
2-3-4-5-6      -97.527840
2-8-9-10-11    962.623408
2-8-9-10-11    937.101308
2-3-4-5-6       37.101308

and has about thousand records.


I don't know how to use mgilson's code, but he give me an idea. I add to data file additional column (index):

1 a 10.1 
2 b 10.1 
3 c 10.2 
2 b 15.56 
1 a 3.20

after which ploting in gnuplot is easy:

plot 'data' u 1:3 

I use perl, so my script lookls like this:

#!/usr/bin/perl 
$index_number = 0; 
while (<>) 
{ 
   $line = $_;
   @columns = split(" ",$line);
   $col1 = $columns[0];
   $col2 = $columns[1];
   if( not exists $non_numeric{$col1} )
   {
      $index_number++;
      $non_numeric{$col1} = $index_number;
   }
   print "".$non_numeric{$col1}."\t".$col1."\t".$col2."\n"; 
}

Upvotes: 4

Views: 1258

Answers (2)

theozh
theozh

Reputation: 25704

Just for the records, there is a not too complicated gnuplot-only solution! As the OP already wrote, it cannot simply be done by plot FILE u 2:xtic(1).

The solution is a variation and mix of Christoph's answer to Gnuplot, plotting a graph with text on y axis and my answer to How do I group strings and their data using Gnuplot?

How it works:

  • function myX(col): while plotting the data row by row: if the string (of your x-column) is not yet found in the string variable list, append it surrounded by quotes, increase the counter c by 1 and add the value of c to the list as well, and return the index of the current string

At the end, the value of the string list in the example below will be:

"a" 1 "b" 2 "c" 3 
  • the function index(list,s) will return the index of s in list by matching the substring s (check help strstrt) and extracting the succeeding number.

Data: SO12123578.dat

a  10.1
b  10.1
c  10.2
b  15.56
a  3.20

Script: (works at least with gnuplot>=4.4.0, March 2010)

### use string values as x-values
reset

FILE = "SO12123578.dat"

list = ''
c = 0
index(list,s) = (_n=strstrt(list,s)) ? int(word(list[_n+strlen(s):],1)) : 0
myX(col)      = (_s='"'.strcol(col).'"', strstrt(list, _s) ? '' : list=list.sprintf('%s %d ',_s,c=c+1), index(list,_s))

set offset 1,1,1,1

plot FILE u(myX(1)):2:xtic(1) w p pt 7 lc rgb "red" notitle
### end of script

Result:

enter image description here

Upvotes: 1

mgilson
mgilson

Reputation: 309831

I doubt that you can come up with a gnuplot only solution. However, this should work as long as you have python2.5 or newer installed on your system. (It works with your test data).

import sys
import collections

data = collections.defaultdict(list)
keys = []

# build a mapping which maps values to xticlabels (hereafter "keys")
# Keep a second keys list so we can figure out the order we put things into
# the mapping (dict)
with open(sys.argv[1]) as f:
    for line in f:
        key,value = line.split()
        data[key.strip()].append( value )
        keys.append(key.strip())

def unique(seq):
    """
    Simple function to make a sequence unique while preserving order.
    Returns a list
    """
    seen = set()
    seen_add = seen.add
    return [ x for x in seq if x not in seen and not seen_add(x) ]

keys = unique(keys) #make keys unique

#write the keys alongside 1 element from the corresponding list.
for k in keys:
    sys.stdout.write( '%s %s\n' % (k, data[k].pop()) )

# Two blank lines tells gnuplot the following is another dataset
sys.stdout.write('\n\n')

# Write the remaining data lists in order assigning x-values
# for each list (starting at 0 and incrementing every time we get
# a new key)
for i,k in enumerate(keys):
    v = data[k]
    for item in v:
       sys.stdout.write( '%d %s\n' % (i, item) )

Now the script to plot this:

set style line 1 lt 1 pt 1
plot '<python pythonscript.py data' i 0 u 2:xticlabels(1) ls 1,\
     '' i 1 u 1:2 ls 1 notitle

Here's how this works. When you do something like plot ... u 2:xticlabels(1), gnuplot implicitly assigns sequential integer x-values to the data points (starting at 0). The python script re-arranges the data to make use of this fact. Basically, I create a mapping which maps the "keys" in the first column to a list of elements that correspond to that key. In other words, in your dummy datafile, the key 'a' maps to the list of values [10.1, 3.2]. However, python dictionaries (mappings) aren't ordered. So I keep a second list which maintains the order (so that you axes are labelled as 'a', 'b', 'c' instead of 'c','a','b' for instance). I make sure that the axes list is unique so that I can use it to print the necessary data. I write the data in 2 passes. The first pass prints only one value from each list along with the mapping "key". The second pass prints the rest of the values along with the x-value that gnuplot will implicitly assign to them. Between the two datasets, I insert 2 blank lines so that gnuplot can sort out the difference using the index keyword (here abbreviated to i). Now we just need to plot the two datasets accordingly. First we set a linestyle so that both passes will have the same style when plotted. Then we plot index 0 (the first dataset) with the xticlabels and index 1 using the x-value,y-value pairs the python script calculated (u 1:2). Sorry the explanation is long (and that the original version was slightly buggy). Good luck and happy gnuplotting!

Upvotes: 1

Related Questions