RZA KHK
RZA KHK

Reputation: 35

Counting a particular value in a particular column in a dataframe?

I have a dataframe "My_data" like this:

var1, var2, var3 
123,   234, 678
443,   567, fd
324,   678, 789
12,    102, fd

I would like to extract/count or calculate the percentage that "fd" occurs in the last column in "Var3", so in this example the output put should be: output:2 or output: 0.50

Upvotes: 1

Views: 548

Answers (2)

Andrew L
Andrew L

Reputation: 7038

Here's a straightforward way:

Pull absolute number of occurrences:

My_data['var3'].value_counts(normalize=False).loc['fd']
2

Pull percent of records:

My_data['var3'].value_counts(normalize=True).loc['fd']
0.5

And this method is faster/more efficient:

%timeit df.var3.value_counts(normalize=True).loc['fd']
1000 loops, best of 3: 597 µs per loop

%timeit df[df['var3']=="fd"].shape[0]/df.shape[0]
The slowest run took 16.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 710 µs per loop

Upvotes: 0

Michael Kirchner
Michael Kirchner

Reputation: 889

You can use the .shape to count the number of rows satisfying the criteria, and you won't have to import anything else.

import pandas as pd
d = {'var1': [123, 443, 324, 12],
     'var2': [234, 567, 678, 102],
     'var3': [678, "fd", 789, "fd"]}
df = pd.DataFrame(data=d)
df[df['var3']=="fd"].shape[0]/df.shape[0]

This should give you 0.5. If you want just the count, use df[df['var3']=="fd"].shape[0].

Upvotes: 0

Related Questions