pratt
pratt

Reputation: 29

Clustermapping in Python using Seaborn

I am trying to create a heatmap with dendrograms on Python using Seaborn and I have a csv file with about 900 rows. I'm importing the file as a pandas dataframe and attempting to plot that but a large number of the rows are not being represented in the heatmap. What am I doing wrong?

This is the code I have right now. But the heatmap only represents about 49 rows. Here is an image of the clustermap I've obtained but it is not displaying all of my data.

import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt

# Data set
df = pd.read_csv('diff_exp_gene.csv', index_col = 0)

# Default plot
sns.clustermap(df, cmap = 'RdBu', row_cluster=True, col_cluster=True)
plt.show()

Thank you.

Upvotes: 0

Views: 896

Answers (3)

JC_CL
JC_CL

Reputation: 2608

Looking at the linked image, that appears like a simple case of the figure being too small/the text of the labels being too big.

figsize=(x,y) is probably what you want.

I can't find the definition for x and y, but looking at my various clustermaps, it seems seems to be size in pixels / 100, i.e. size you want, assuming 100 dpi. With 900 entries and assuming you need 10 pixels height to make your labels readable, you'd need 900 * 10 = 9000 pixels height, i.e. figsize=(90,90) (assuming a square matrix).

That of course requires a lot of zooming to be able to read anything (or a huge screen/plot), but you simply can't put 900 lines of text in a household-sized image.

Upvotes: 0

RDoc
RDoc

Reputation: 346

As far as I know, keywords that apply to seaborn heatmaps also apply to clustermap, as the sns.clustermap passes to the sns.heatmap. In that case, all you need to do in your example is to set yticklabels=True as a keyword argument in sns.clustermap(). That will make all of the 900 rows appear.

By default, it is set as "auto" to avoid overlap. The same applies to the xticklabels. See more here: https://seaborn.pydata.org/generated/seaborn.heatmap.html

Upvotes: 0

Jeffrey Chiu
Jeffrey Chiu

Reputation: 17

An alternative approach would be to use imshow in matpltlib. I'm not exactly sure what your question is but I demonstrate a way to graph points on a plane from csv file

import numpy as np
import matplotlib.pyplot as plt
import csv

infile = open('diff_exp_gene.csv')
df = csv.DictReader(in_file)
temp = np.zeros((128,128), dtype = int)
for row in data:
    if row['TYPE'] == types:
       temp[int(row['Y'])][int(row['X'])] = temp[int(row['Y'])][int(row['X'])] + 1
plt.imshow(temp, cmap = 'hot', origin = 'lower')
plt.show()

Upvotes: 0

Related Questions