ThomasWest
ThomasWest

Reputation: 505

Plot a histogram through all the columns

I have a data that has several columns in it.

Country    Weight    # of food/day  ....
---------------------------------------------
USA         180         4
China       190         12
USA         150         2
Canada      300         10

I want to create (separate) histogram for each of the columns such that histogram_1 shows the distribution of 'Country', histogram_2 shows the distribution of 'Weight', etc.

I'm currently using panda to upload and manipulate the data.

Is the easy way to do this is by doing like this?

for column in df:
    plt.hist(column)
    plt.show()

Please forgive me if my idea sounds so stupid.

Any help would be highly appreciated, thanks!

Upvotes: 2

Views: 11123

Answers (4)

pypulp
pypulp

Reputation: 1

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({"Countries" : ["USA", "Mexico", "Canada", "USA", "Mexico"],
                   "Weight" : [200, 150, 190, 60, 40],
                   "Food" : [2,6,4,4,6]})

for col in df.columns:
    plt.hist(df[col])
    plt.xlabel(col)
    plt.show()

Upvotes: 0

Joe
Joe

Reputation: 392

Can use this instead of for loop, histograms for all numeric columns will be generated!

df.hist(bins=10, figsize=(25, 20))

Upvotes: 2

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339280

Defining a histogram for non-numeric or discrete values is not unambiguous. Often the question is "how many item of each unique kind are there?". This can be achieved through .value_counts. Since you say "# of histograms == # of columns (features)", we might create one subplot per column.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({"Countries" : ["USA", "Mexico", "Canada", "USA", "Mexico"],
                   "Weight" : [180, 120, 100, 120, 130],
                   "Food" : [2,2,2,4,2]})

fig, axes = plt.subplots(ncols=len(df.columns), figsize=(10,5))
for col, ax in zip(df, axes):
    df[col].value_counts().sort_index().plot.bar(ax=ax, title=col)

plt.tight_layout()    
plt.show()

enter image description here

Upvotes: 3

Joe
Joe

Reputation: 12417

If you want the histograms in different windows, then you can do in this way:

df.set_index('Country', inplace=True)
for col in df.columns:
    df[col].plot.bar()
    plt.show()

Upvotes: 0

Related Questions