Reputation: 157
I have been trying to import a pre-calculated distance matrix using pandas and I want to use it to make a heatmap using seaborn. I have used the following codes:
import pandas as pd
msa = pd.read_csv("Multiple_alignment_distance_matrix.csv")
The output below does not look like a distance matrix.
sp|Q9BYW2|SETD2_HUMAN Histone-lysine N-methyltransferase SETD2 OS=Homo sapiens OX=9606 GN=SETD2 PE=1 SV=3 sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens OX=9606 GN=HTT PE=1 SV=2 sp|Q8IUH5|ZDH17_HUMAN Palmitoyltransferase ZDHHC17 OS=Homo sapiens OX=9606 GN=ZDHHC17 PE=1 SV=2 sp|O75400|PR40A_HUMAN Pre-mRNA-processing factor 40 homolog A OS=Homo sapiens OX=9606 GN=PRPF40A PE=1 SV=2 tr|F8VU11|F8VU11_HUMAN PRP40 pre-mRNA processing factor 40 homolog B (Yeast), isoform CRA_a OS=Homo sapiens OX=9606 GN=PRPF40B PE=1 SV=2 sp|Q6NWY9|PR40B_HUMAN Pre-mRNA-processing factor 40 homolog B OS=Homo sapiens OX=9606 GN=PRPF40B PE=1 SV=1 sp|P43357|MAGA3_HUMAN Melanoma-associated antigen 3 OS=Homo sapiens OX=9606 GN=MAGEA3 PE=1 SV=1 tr|A0A024RBM8|A0A024RBM8_HUMAN AMPylator FICD OS=Homo sapiens OX=9606 GN=HYPE PE=3 SV=1 sp|Q9BVA6|FICD_HUMAN Protein adenylyltransferase FICD OS=Homo sapiens OX=9606 GN=FICD PE=1 SV=2 tr|B3KSH4|B3KSH4_HUMAN Huntingtin interacting protein 2, isoform CRA_a OS=Homo sapiens OX=9606 GN=HIP2 PE=2 SV=1 tr|B4DIZ2|B4DIZ2_HUMAN cDNA FLJ57995, moderately similar to Ubiquitin-conjugating enzyme E2-25 kDa OS=Homo sapiens OX=9606 PE=2 SV=1 sp|P61086|UBE2K_HUMAN Ubiquitin-conjugating enzyme E2 K OS=Homo sapiens OX=9606 GN=UBE2K PE=1 SV=3
0 sp|Q9BYW2|SETD2_HUMAN Histone-lysine N-methylt... 2564 409 69 114 109 107 41 89 89 9 13 19
1 sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens ... 409 3142 90 126 143 143 59 58 58 15 14 18
2 sp|Q8IUH5|ZDH17_HUMAN Palmitoyltransferase ZDH... 69 90 632 5 10 10 1 16 16 0 2 2
3 sp|O75400|PR40A_HUMAN Pre-mRNA-processing fact... 114 126 5 957 502 498 15 5 5 0 0 0
4 tr|F8VU11|F8VU11_HUMAN PRP40 pre-mRNA processi... 109 143 10 502 892 870 17 3 3 0 0 0
5 sp|Q6NWY9|PR40B_HUMAN Pre-mRNA-processing fact... 107 143 10 498 870 871 16 3 3 0 0 0
6 sp|P43357|MAGA3_HUMAN Melanoma-associated anti... 41 59 1 15 17 16 314 1 1 0 0 0
7 tr|A0A024RBM8|A0A024RBM8_HUMAN AMPylator FICD ... 89 58 16 5 3 3 1 458 458 19 29 42
8 sp|Q9BVA6|FICD_HUMAN Protein adenylyltransfera... 89 58 16 5 3 3 1 458 458 19 29 42
9 tr|B3KSH4|B3KSH4_HUMAN Huntingtin interacting ... 9 15 0 0 0 0 0 19 19 97 67 97
10 tr|B4DIZ2|B4DIZ2_HUMAN cDNA FLJ57995, moderate... 13 14 2 0 0 0 0 29 29 67 139 139
11 sp|P61086|UBE2K_HUMAN Ubiquitin-conjugating en... 19 18 2 0 0 0 0 42 42 97 139 200
The columns look alright but rows are indexed (as 0, 1, 2...). I have tried to use this to create the heatmap
import seaborn as sns
sns.heatmap(msa)
But I get a TypeError. I have tried to read the pandas and scipy documentation. But I am having a hard time understanding it.
Upvotes: 0
Views: 233
Reputation: 262234
As I expected, you can add the index_col=0
parameter to your read_csv
function:
import pandas as pd
import seaborn as sns
df = pd.read_csv('Multiple_alignment_distance_matrix.csv', index_col=0)
sns.heatmap(df)
def prot_name(s):
import re
match = re.search('^[^ ]+ (.*) OS=', s)
if match:
return match.group(1)
sns.heatmap(df.rename(columns=prot_name, index=prot_name))
Upvotes: 1