Reputation: 505
I have a data that has several columns in it.
Country Weight # of food/day ....
---------------------------------------------
USA 180 4
China 190 12
USA 150 2
Canada 300 10
I want to create (separate) histogram for each of the columns such that histogram_1 shows the distribution of 'Country', histogram_2 shows the distribution of 'Weight', etc.
I'm currently using panda to upload and manipulate the data.
Is the easy way to do this is by doing like this?
for column in df:
plt.hist(column)
plt.show()
Please forgive me if my idea sounds so stupid.
Any help would be highly appreciated, thanks!
Upvotes: 2
Views: 11123
Reputation: 1
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"Countries" : ["USA", "Mexico", "Canada", "USA", "Mexico"],
"Weight" : [200, 150, 190, 60, 40],
"Food" : [2,6,4,4,6]})
for col in df.columns:
plt.hist(df[col])
plt.xlabel(col)
plt.show()
Upvotes: 0
Reputation: 392
Can use this instead of for loop, histograms for all numeric columns will be generated!
df.hist(bins=10, figsize=(25, 20))
Upvotes: 2
Reputation: 339280
Defining a histogram for non-numeric or discrete values is not unambiguous. Often the question is "how many item of each unique kind are there?". This can be achieved through .value_counts
. Since you say "# of histograms == # of columns (features)", we might create one subplot per column.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"Countries" : ["USA", "Mexico", "Canada", "USA", "Mexico"],
"Weight" : [180, 120, 100, 120, 130],
"Food" : [2,2,2,4,2]})
fig, axes = plt.subplots(ncols=len(df.columns), figsize=(10,5))
for col, ax in zip(df, axes):
df[col].value_counts().sort_index().plot.bar(ax=ax, title=col)
plt.tight_layout()
plt.show()
Upvotes: 3
Reputation: 12417
If you want the histograms in different windows, then you can do in this way:
df.set_index('Country', inplace=True)
for col in df.columns:
df[col].plot.bar()
plt.show()
Upvotes: 0