Reputation: 25416
I have the following data frame my_df
:
my_1 my_2 my_3
--------------------------------
0 5 7 4
1 3 5 13
2 1 2 8
3 12 9 9
4 6 1 2
I want to make a plot where x-axis is categorical values with my_1, my_2, and my_3. y-axis is integer. For each column in my_df
, I want to plot all its 5 values at x = my_i. What kind of plot should I use in matplotlib? Thanks!
Upvotes: 1
Views: 3065
Reputation: 880927
You could make a bar chart:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})
df.T.plot(kind='bar')
plt.show()
or a scatter plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})
fig, ax = plt.subplots()
cols = np.arange(len(df.columns))
x = np.repeat(cols, len(df))
y = df.values.ravel(order='F')
color = np.tile(np.arange(len(df)), len(df.columns))
scatter = ax.scatter(x, y, s=150, c=color)
ax.set_xticks(cols)
ax.set_xticklabels(df.columns)
cbar = plt.colorbar(scatter)
cbar.set_ticks(np.arange(len(df)))
plt.show()
Just for fun, here is how to make the same scatter plot using Pandas' df.plot
:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})
columns = df.columns
index = df.index
df = df.stack()
df.index.names = ['color', 'column']
df = df.rename('y').reset_index()
df['x'] = pd.Categorical(df['column']).codes
ax = df.plot(kind='scatter', x='x', y='y', c='color', colorbar=True,
cmap='viridis', s=150)
ax.set_xticks(np.arange(len(columns)))
ax.set_xticklabels(columns)
cbar = ax.collections[-1].colorbar
cbar.set_ticks(index)
plt.show()
Unfortunately, it requires quite a bit of DataFrame manipulation just to call
df.plot
and then there are some extra matplotlib calls needed to set the tick
marks on the scatter plot and colorbar. Since Pandas is not saving effort here,
I would go with the first (NumPy/matplotlib) approach shown above.
Upvotes: 2