Reputation: 35
I have a dataframe "My_data" like this:
var1, var2, var3
123, 234, 678
443, 567, fd
324, 678, 789
12, 102, fd
I would like to extract/count or calculate the percentage that "fd" occurs in the last column in "Var3", so in this example the output put should be: output:2 or output: 0.50
Upvotes: 1
Views: 548
Reputation: 7038
Here's a straightforward way:
Pull absolute number of occurrences:
My_data['var3'].value_counts(normalize=False).loc['fd']
2
Pull percent of records:
My_data['var3'].value_counts(normalize=True).loc['fd']
0.5
And this method is faster/more efficient:
%timeit df.var3.value_counts(normalize=True).loc['fd']
1000 loops, best of 3: 597 µs per loop
%timeit df[df['var3']=="fd"].shape[0]/df.shape[0]
The slowest run took 16.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 710 µs per loop
Upvotes: 0
Reputation: 889
You can use the .shape
to count the number of rows satisfying the criteria, and you won't have to import anything else.
import pandas as pd
d = {'var1': [123, 443, 324, 12],
'var2': [234, 567, 678, 102],
'var3': [678, "fd", 789, "fd"]}
df = pd.DataFrame(data=d)
df[df['var3']=="fd"].shape[0]/df.shape[0]
This should give you 0.5
. If you want just the count, use df[df['var3']=="fd"].shape[0]
.
Upvotes: 0