Reputation: 2730
I am currently analysing two character combinations in texts and I want to visualize the frequencies in a heatmap using gnuplot. My input file is in the format (COUNT stands for the actual number of this combination)
a a COUNT
a b COUNT
...
z y COUNT
z z COUNT
Now I'd like to create a heatmap (like the first one that is shown on this site). On the x axis as well on the y axis I'd like to display the characters from A-Z, i.e.
a
b
...
z
a b ... z
I am pretty new to gnuplot, so I tried plot "input.dat" using 2:1:3 with images
, which results in an error message "Can't plot with an empty x range". My naive approach to run set xrange['a':'z']
did not help much.
There are a bunch of related questions on SO, but they either deal with numeric x-values (e.g. Heatmap with Gnuplot on a non-uniform grid) or with different input data formats (e.g. gnuplot: label x and y-axis of matrix (heatmap) with row and column names)
So my question is: What is the easiest way to transform my input file into a nice gnuplot heatmap?
Upvotes: 3
Views: 1388
Reputation: 25692
Edit: Revised code, better sticking to the original question.
Your question basically boils down to: is there an ord()
function in gnuplot?
Answer: No, there is not, but you can built it yourself, without the need for calling external scripts. The "ASCII-Trick" is taken from here: how can I find out the ASCII code of a character in gnuplot
The following example works with gnuplot>=4.6.0 (version at the time of OP's question).
Code:
### plotting heatmap from "alphabetical data"
reset
# definition of chr() and ord()
chr(n) = sprintf('%c',n)
ASCII = ''; do for [i=1:255] {ASCII = ASCII.chr(i)}
ord(c) = strstrt(ASCII,c)
FILE = "SO20428010.dat"
# create some random test data
set print FILE
do for [i=1:26] for [j=1:26] {
print sprintf("%s %s %d", chr(i+96), chr(j+96), int(rand(0)*101))
}
set print
set size square
set xrange[0:27]
set yrange[27:0] reverse
set key noautotitle
set palette rgb 33,13,10
ChrToInt(col) = ord(strcol(col))-96
plot FILE u (ChrToInt(1)):(ChrToInt(2)):3:xtic(1):ytic(2) w image
### end of code
Result:
Upvotes: 1
Reputation: 15910
You need to convert the alphabet characters to integers. It might be possible to do this somehow in gnuplot, but it would probably be messy.
My solution would be to use a quick python script to convert the datafile (let's say it is called data.dat
):
#!/usr/bin/env python2.7
with open('data.dat', 'r') as i:
with open('data2.dat', 'w') as o:
lines = i.readlines()
for line in lines:
line = line.split()
x = str(ord(line[0].lower()) - ord('a'))
y = str(ord(line[1].lower()) - ord('a'))
o.write("%s %s %s\n" % (x, y, line[2]))
This takes a file like this:
a a 1
a b 2
a c 3
b a 4
b b 5
b c 6
c a 7
c b 8
c c 9
and converts it to:
0 0 1
0 1 2
0 2 3
1 0 4
1 1 5
1 2 6
2 0 7
2 1 8
2 2 9
Then you can plot it in gnuplot:
#!/usr/bin/env gnuplot
set terminal pngcairo
set output 'test.png'
set xtics ("a" 0, "b" 1, "c" 2)
set ytics ("a" 0, "b" 1, "c" 2)
set xlabel 'First Character'
set ylabel 'Second Character'
set title 'Character Combination Counts'
plot 'data2.dat' with image
It's a little clunky to set the tics manually that way, but it works fine.
Upvotes: 4