Edamame
Edamame

Reputation: 25416

Matplotlib: plot the entire column values in pandas

I have the following data frame my_df:

        my_1     my_2     my_3
--------------------------------
0         5       7        4
1         3       5       13
2         1       2        8
3        12       9        9
4         6       1        2

I want to make a plot where x-axis is categorical values with my_1, my_2, and my_3. y-axis is integer. For each column in my_df, I want to plot all its 5 values at x = my_i. What kind of plot should I use in matplotlib? Thanks!

Upvotes: 1

Views: 3065

Answers (1)

unutbu
unutbu

Reputation: 880927

You could make a bar chart:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})

df.T.plot(kind='bar')
plt.show()

enter image description here

or a scatter plot:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})

fig, ax = plt.subplots()
cols = np.arange(len(df.columns))
x = np.repeat(cols, len(df))
y = df.values.ravel(order='F')
color = np.tile(np.arange(len(df)), len(df.columns))
scatter = ax.scatter(x, y, s=150, c=color)
ax.set_xticks(cols)
ax.set_xticklabels(df.columns)
cbar = plt.colorbar(scatter)
cbar.set_ticks(np.arange(len(df)))
plt.show()

enter image description here

Just for fun, here is how to make the same scatter plot using Pandas' df.plot:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})

columns = df.columns
index = df.index
df = df.stack()
df.index.names = ['color', 'column']
df = df.rename('y').reset_index()
df['x'] = pd.Categorical(df['column']).codes
ax = df.plot(kind='scatter', x='x', y='y', c='color', colorbar=True, 
             cmap='viridis', s=150)
ax.set_xticks(np.arange(len(columns)))
ax.set_xticklabels(columns)
cbar = ax.collections[-1].colorbar
cbar.set_ticks(index)
plt.show()

Unfortunately, it requires quite a bit of DataFrame manipulation just to call df.plot and then there are some extra matplotlib calls needed to set the tick marks on the scatter plot and colorbar. Since Pandas is not saving effort here, I would go with the first (NumPy/matplotlib) approach shown above.

Upvotes: 2

Related Questions