Moe
Moe

Reputation: 2730

gnuplot: Heatmap using character combinations

I am currently analysing two character combinations in texts and I want to visualize the frequencies in a heatmap using gnuplot. My input file is in the format (COUNT stands for the actual number of this combination)

a a COUNT
a b COUNT
...
z y COUNT
z z COUNT

Now I'd like to create a heatmap (like the first one that is shown on this site). On the x axis as well on the y axis I'd like to display the characters from A-Z, i.e.

a
b
...
z
     a b ... z

I am pretty new to gnuplot, so I tried plot "input.dat" using 2:1:3 with images, which results in an error message "Can't plot with an empty x range". My naive approach to run set xrange['a':'z'] did not help much.

There are a bunch of related questions on SO, but they either deal with numeric x-values (e.g. Heatmap with Gnuplot on a non-uniform grid) or with different input data formats (e.g. gnuplot: label x and y-axis of matrix (heatmap) with row and column names)

So my question is: What is the easiest way to transform my input file into a nice gnuplot heatmap?

Upvotes: 3

Views: 1388

Answers (2)

theozh
theozh

Reputation: 25692

Edit: Revised code, better sticking to the original question.

Your question basically boils down to: is there an ord() function in gnuplot? Answer: No, there is not, but you can built it yourself, without the need for calling external scripts. The "ASCII-Trick" is taken from here: how can I find out the ASCII code of a character in gnuplot

The following example works with gnuplot>=4.6.0 (version at the time of OP's question).

Code:

### plotting heatmap from "alphabetical data"
reset

# definition of chr() and ord()
chr(n) = sprintf('%c',n)
ASCII = ''; do for [i=1:255] {ASCII = ASCII.chr(i)}
ord(c) = strstrt(ASCII,c)

FILE = "SO20428010.dat"
# create some random test data
set print FILE
    do for [i=1:26] for [j=1:26] {
        print sprintf("%s %s %d", chr(i+96), chr(j+96), int(rand(0)*101))
    }
set print

set size square
set xrange[0:27]
set yrange[27:0] reverse
set key noautotitle
set palette rgb 33,13,10

ChrToInt(col) = ord(strcol(col))-96

plot FILE u (ChrToInt(1)):(ChrToInt(2)):3:xtic(1):ytic(2) w image
### end of code

Result:

enter image description here

Upvotes: 1

andyras
andyras

Reputation: 15910

You need to convert the alphabet characters to integers. It might be possible to do this somehow in gnuplot, but it would probably be messy.

My solution would be to use a quick python script to convert the datafile (let's say it is called data.dat):

#!/usr/bin/env python2.7

with open('data.dat', 'r') as i:
    with open('data2.dat', 'w') as o:
        lines = i.readlines()
        for line in lines:
            line = line.split()
            x = str(ord(line[0].lower()) - ord('a'))
            y = str(ord(line[1].lower()) - ord('a'))
            o.write("%s %s %s\n" % (x, y, line[2]))

This takes a file like this:

a a 1
a b 2
a c 3
b a 4
b b 5
b c 6
c a 7
c b 8
c c 9

and converts it to:

0 0 1
0 1 2
0 2 3
1 0 4
1 1 5
1 2 6
2 0 7
2 1 8
2 2 9

Then you can plot it in gnuplot:

#!/usr/bin/env gnuplot

set terminal pngcairo
set output 'test.png'

set xtics ("a" 0, "b" 1, "c" 2)
set ytics ("a" 0, "b" 1, "c" 2)

set xlabel 'First Character'
set ylabel 'Second Character'

set title 'Character Combination Counts'

plot 'data2.dat' with image

It's a little clunky to set the tics manually that way, but it works fine.

enter image description here

Upvotes: 4

Related Questions