Sean
Sean

Reputation: 3385

Making a bar chart to represent the number of occurrences in a Pandas Series

I was wondering if anyone could help me with how to make a bar chart to show the frequencies of values in a Pandas Series.

I start with a Pandas DataFrame of shape (2000, 7), and from there I extract the last column. The column is shape (2000,).

The entries in the Series that I mentioned vary from 0 to 17, each with different frequencies, and I tried to plot them using a bar chart but faced some difficulties. Here is my code:

# First, I counted the number of occurrences.

count = np.zeros(max(data_val))

for i in range(count.shape[0]):
    for j in range(data_val.shape[0]):
        if (i == data_val[j]):
            count[i] = count[i] + 1

'''
This gives us
count = array([192., 105., ... 19.])
'''

temp = np.arange(0, 18, 1) # Array for the x-axis.

plt.bar(temp, count)

I am getting an error on the last line of code, saying that the objects cannot be broadcast to a single shape.

What I ultimately want is a bar chart where each bar corresponds to an integer value from 0 to 17, and the height of each bar (i.e. the y-axis) represents the frequencies.

Thank you.


UPDATE

I decided to post the fixed code using the suggestions that people were kind enough to give below, just in case anybody facing similar issues will be able to see my revised code in the future.

data = pd.read_csv("./data/train.csv") # Original data is a (2000, 7) DataFrame
# data contains 6 feature columns and 1 target column.

# Separate the design matrix from the target labels.
X = data.iloc[:, :-1]
y = data['target']


'''
The next line of code uses pandas.Series.value_counts() on y in order to count
the number of occurrences for each label, and then proceeds to sort these according to
index (i.e. label).

You can also use pandas.DataFrame.sort_values() instead if you're interested in sorting
according to the number of frequencies rather than labels.
'''
y.value_counts().sort_index().plot.bar(x='Target Value', y='Number of Occurrences')

enter image description here

There was no need to use for loops if we use the methods that are built into the Pandas library.

The specific methods that were mentioned in the answers are pandas.Series.values_count(), pandas.DataFrame.sort_index(), and pandas.DataFrame.plot.bar().

Upvotes: 5

Views: 10719

Answers (2)

jezrael
jezrael

Reputation: 862511

I believe you need value_counts with Series.plot.bar:

df = pd.DataFrame({
         'a':[4,5,4,5,5,4],
         'b':[7,8,9,4,2,3],
         'c':[1,3,5,7,1,0],
         'd':[1,1,6,1,6,5],
})

print (df)
   a  b  c  d
0  4  7  1  1
1  5  8  3  1
2  4  9  5  6
3  5  4  7  1
4  5  2  1  6
5  4  3  0  5


df['d'].value_counts(sort=False).plot.bar()

pic

If possible some value missing and need set it to 0 add reindex:

df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0).plot.bar()

g

Detail:

print (df['d'].value_counts(sort=False))
1    3
5    1
6    2
Name: d, dtype: int64

print (df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0))
0     0
1     3
2     0
3     0
4     0
5     1
6     2
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
15    0
16    0
17    0
Name: d, dtype: int64

Upvotes: 4

sync11
sync11

Reputation: 1280

Here's an approach using Seaborn

import numpy as np
import pandas as pd
import seaborn as sns

s = pd.Series(np.random.choice(17, 10))
s
# 0    10
# 1    13
# 2    12
# 3     0
# 4     0
# 5     5
# 6    13
# 7     9
# 8    11
# 9     0
# dtype: int64

val, cnt = np.unique(s, return_counts=True)
val, cnt
# (array([ 0,  5,  9, 10, 11, 12, 13]), array([3, 1, 1, 1, 1, 1, 2]))

sns.barplot(val, cnt)

Seaborn plot

Upvotes: 2

Related Questions