Reputation: 35
I want to create a Python dataframe that counts the number of times that the types C or P occur for each date in each sample. C and P are subsets of F meaning that whenever Type is equal to F, I can count the number of rows with the same Sample value and the same Date value.
I cannot hardcode the conditions as the actual data set is much larger than the sample. This means that I need to base my conditions on the values in the data set where Type is equal to F.
Here is a sample of my data set, df_Data (the actual one being much larger):
Sample | Type | Date | Strike
A | F | 1 Jun 2020 |
A | C | 1 Jun 2020 | 5
A | P | 1 Jun 2020 | 2.5
A | F | 1 Dec 2020 |
A | P | 1 Dec 2020 | 3
A | P | 1 Dec 2020 | 3.5
A | P | 1 Dec 2020 | 4
B | F | 1 Jun 2020 |
B | C | 1 Jun 2020 | 2.5
B | C | 1 Jun 2020 | 3
B | C | 1 Jun 2020 | 4
B | F | 1 Dec 2020 |
B | C | 1 Dec 2020 | 2
B | C | 1 Dec 2020 | 4
B | P | 1 Dec 2020 | 2
B | P | 1 Dec 2020 | 4
The corresponding output is a new dataframe like this one:
Sample | Type | Date | Count
=======================
A | F | 1 Jun 2020 |2
A | F | 1 Dec 2020 |3
B | F | 1 Jun 2020 |3
B | F | 1 Dec 2020 |4
In Excel I would have used the CountIfs-function:
IF Type = "F" then countifs(Sample-column, Sample-value, Date-column, Date-value, Type-column, "<> F")
Please help (apologies for the poor table formatting).
Upvotes: 1
Views: 459
Reputation: 35
I was able to get the required output based on the logic provided by @Umesh Sharma.
From the original dataframe, df, I created a dataframe without any of the F rows.
df_no_F = df[df['Type'] != "F"]
I then use groupby and count to determine the number of rows for each Sample and Date combination.
df_count = df_no_F.groupby(['Sample' , 'Date'], as_index=False).count()
Upvotes: 2
Reputation: 34
I suggest to following code:
import pandas as pd
df = pd.read_csv('sample.csv')
df['Type']=df['Type'].apply(lambda x: 'F' if x == 'P' or 'C' else '')
adf = df.groupby(['Sample', 'Type', 'Dat`enter code here`e'],as_index=False).agg({'Strike':['count']})
This code reads sample.csv
into a dataframe. Then, you perform a lambda-query on the column Type
. The final result is in adf
.
Upvotes: 2
Reputation: 2684
Here's how I would do this. Use df.loc
to filter down to the desired Types and then groupby
and count
:)
import pandas as pd
from io import StringIO
# Getting a df in the right form
s = """Sample | Type | Date | Strike
A | F | 1 Jun 2020 |
A | C | 1 Jun 2020 | 5
A | P | 1 Jun 2020 | 2.5
A | F | 1 Dec 2020 |
A | P | 1 Dec 2020 | 3
A | P | 1 Dec 2020 | 3.5
A | P | 1 Dec 2020 | 4
B | F | 1 Jun 2020 |
B | C | 1 Jun 2020 | 2.5
B | C | 1 Jun 2020 | 3
B | C | 1 Jun 2020 | 4
B | F | 1 Dec 2020 |
B | C | 1 Dec 2020 | 2
B | C | 1 Dec 2020 | 4
B | P | 1 Dec 2020 | 2
B | P | 1 Dec 2020 | 4"""
df = pd.DataFrame(pd.read_csv(StringIO(s), sep='|'))
df.columns = [i.strip() for i in df.columns]
for col in ['Sample', 'Type', 'Date']:
df[col] = df[col].str.strip()
# Meat and bones of this operation
new_df = df.loc[(df['Type'] == 'C') | (df['Type'] == 'P')].groupby(['Sample', 'Date'])['Strike'].count().reset_index()
# Reformat to desired output
new_df['Type'] = 'F'
new_df = new_df[df.columns].rename(columns={'Strike': 'Count'}).sort_values(by=['Sample', 'Date'], ascending=[True, False])
print(new_df)
Output:
Sample Type Date Count
1 A F 1 Jun 2020 2
0 A F 1 Dec 2020 3
3 B F 1 Jun 2020 3
2 B F 1 Dec 2020 4
Upvotes: 1