muuh
muuh

Reputation: 1063

Pandas Scatterplot with colorcoded points

I'd like to make a scatter plot from a Dataframe, where each point is visualized with a unique color in dependence how often that value occured. As example, I have the following dataframe, consisting of lists of two numeric values:

df = pd.DataFrame({'width': image_widths, 'height': image_heights})
df.head(10)
   height  width
0    1093    640
1    1136    639
2    1095    640
3    1136    639
4    1095    640
5    1100    640
6    1136    640
7    1136    639
8    1136    640
9    1031    640

Now, as you see, some value-pairs occure multiple times. For example (1095/640) occures at index 2 and 4. How do I give this dot a color representing "Two occurences". And it would be even better, if the color is picked automatically from a continous spectrum, like in a colorbar plot. Such that already the color-shade gives you an impression of the frequency, rather then by manually looking up what the color represents it.

An alternative to coloring, I also would appreciate, is having the frequency of occurences coded as radius of the dots.

EDIT:

To specify my question, I figured out, that df.groupby(['width','height']).size() gives me the count of all combinations. Now I lack the skill to link this information with the color (or size) of the dots in the plot.

Upvotes: 0

Views: 1989

Answers (1)

Stop harming Monica
Stop harming Monica

Reputation: 12590

So let's make this a true Minimal, Complete, and Verifiable example:

import matplotlib.pyplot as plt
import pandas as pd

image_heights = [1093, 1136, 1095, 1136, 1095, 1100, 1136, 1136, 1136, 1031]
image_widths = [640, 639, 640, 639, 640, 640, 640, 639, 640, 640]
df = pd.DataFrame({'width': image_widths, 'height': image_heights})
print(df)

   width  height
0    640    1093
1    639    1136
2    640    1095
3    639    1136
4    640    1095
5    640    1100
6    640    1136
7    639    1136
8    640    1136
9    640    1031

You want the sizes (counts) along with the widths and heights in a DataFrame:

plot_df = df.groupby(['width','height']).size().reset_index(name='count')
print(plot_df)

   width  height  count
0    639    1136      3
1    640    1031      1
2    640    1093      1
3    640    1095      2
4    640    1100      1
5    640    1136      2

The colors and sizes in a scatterplot are controled by the c and s keywords if you use DataFrame.plot.scatter:

plot_df.plot.scatter(x='height', y='width', s=10 * plot_df['count']**2,
                     c='count', cmap='viridis')

Scatter plot

Upvotes: 4

Related Questions