will
will

Reputation: 35

Python - How do I do conditional counting of rows based on various columns?

I want to create a Python dataframe that counts the number of times that the types C or P occur for each date in each sample. C and P are subsets of F meaning that whenever Type is equal to F, I can count the number of rows with the same Sample value and the same Date value.
I cannot hardcode the conditions as the actual data set is much larger than the sample. This means that I need to base my conditions on the values in the data set where Type is equal to F.
Here is a sample of my data set, df_Data (the actual one being much larger):

Sample | Type | Date | Strike
A | F | 1 Jun 2020 |
A | C | 1 Jun 2020 | 5
A | P | 1 Jun 2020 | 2.5
A | F | 1 Dec 2020 |
A | P | 1 Dec 2020 | 3
A | P | 1 Dec 2020 | 3.5
A | P | 1 Dec 2020 | 4
B | F | 1 Jun 2020 |
B | C | 1 Jun 2020 | 2.5
B | C | 1 Jun 2020 | 3
B | C | 1 Jun 2020 | 4
B | F | 1 Dec 2020 |
B | C | 1 Dec 2020 | 2
B | C | 1 Dec 2020 | 4
B | P | 1 Dec 2020 | 2
B | P | 1 Dec 2020 | 4

The corresponding output is a new dataframe like this one:

Sample | Type | Date | Count

=======================

A | F | 1 Jun 2020 |2
A | F | 1 Dec 2020 |3
B | F | 1 Jun 2020 |3
B | F | 1 Dec 2020 |4  

In Excel I would have used the CountIfs-function:
IF Type = "F" then countifs(Sample-column, Sample-value, Date-column, Date-value, Type-column, "<> F")

Please help (apologies for the poor table formatting).

Upvotes: 1

Views: 459

Answers (3)

will
will

Reputation: 35

I was able to get the required output based on the logic provided by @Umesh Sharma.
From the original dataframe, df, I created a dataframe without any of the F rows.

df_no_F = df[df['Type'] != "F"]  

I then use groupby and count to determine the number of rows for each Sample and Date combination.

df_count = df_no_F.groupby(['Sample' , 'Date'], as_index=False).count()

Upvotes: 2

Umesh Sharma
Umesh Sharma

Reputation: 34

I suggest to following code:

import pandas as pd
df = pd.read_csv('sample.csv')
df['Type']=df['Type'].apply(lambda x: 'F' if x == 'P' or 'C' else '')
adf = df.groupby(['Sample', 'Type', 'Dat`enter code here`e'],as_index=False).agg({'Strike':['count']})

This code reads sample.csv into a dataframe. Then, you perform a lambda-query on the column Type. The final result is in adf.

Upvotes: 2

cosmic_inquiry
cosmic_inquiry

Reputation: 2684

Here's how I would do this. Use df.loc to filter down to the desired Types and then groupby and count :)

import pandas as pd
from io import StringIO

# Getting a df in the right form
s = """Sample | Type | Date | Strike
A | F | 1 Jun 2020 |
A | C | 1 Jun 2020 | 5
A | P | 1 Jun 2020 | 2.5
A | F | 1 Dec 2020 |
A | P | 1 Dec 2020 | 3
A | P | 1 Dec 2020 | 3.5
A | P | 1 Dec 2020 | 4
B | F | 1 Jun 2020 |
B | C | 1 Jun 2020 | 2.5
B | C | 1 Jun 2020 | 3
B | C | 1 Jun 2020 | 4
B | F | 1 Dec 2020 |
B | C | 1 Dec 2020 | 2
B | C | 1 Dec 2020 | 4
B | P | 1 Dec 2020 | 2
B | P | 1 Dec 2020 | 4"""
df = pd.DataFrame(pd.read_csv(StringIO(s), sep='|'))
df.columns = [i.strip() for i in df.columns]
for col in ['Sample', 'Type', 'Date']:
    df[col] = df[col].str.strip()

# Meat and bones of this operation
new_df = df.loc[(df['Type'] == 'C') | (df['Type'] == 'P')].groupby(['Sample', 'Date'])['Strike'].count().reset_index()

# Reformat to desired output
new_df['Type'] = 'F'
new_df = new_df[df.columns].rename(columns={'Strike': 'Count'}).sort_values(by=['Sample', 'Date'], ascending=[True, False])
print(new_df)

Output:

  Sample Type        Date  Count
1      A    F  1 Jun 2020      2
0      A    F  1 Dec 2020      3
3      B    F  1 Jun 2020      3
2      B    F  1 Dec 2020      4

Upvotes: 1

Related Questions