Reputation: 175
I have a pandas dataframe, here is an example:
names subject date marks
A X 15-05-2021 10
A X 22-05-2021 12
A X 29-05-2021 11
A Y 15-05-2021 15
A Y 22-05-2021 10
B P 22-05-2021 17
B P 29-05-2021 16
B Q 29-05-2021 14
B R 22-05-2021 16
I need to filter out all names, dates and subjects for which data is available for "29-05-2021".
this is what I need:
names subject date marks
A X 15-05-2021 10
A X 22-05-2021 12
A X 29-05-2021 11
B P 22-05-2021 17
B P 29-05-2021 16
B Q 29-05-2021 14
I have figured out how to do this, but I think it can be improved.
this is what I am doing as of now:
df = pd.read_excel("",name="sheet")
end_date = "29-05-2021"
end_date = pd.to_datetime(end_date)
mask = df["date"] == end_date
temp_df = df.loc[mask]
temp_df = temp_df[["names","subject"]]
result_df = pd.merge(df,temp_df, on = ["names","subject"],how="right"]
is there any scope of improvement?
Upvotes: 0
Views: 49
Reputation: 862911
I think for performance is your solution good, a bit simplified with remove on
(because joined by intersection of columns names, here name, subject
) and need inner join, which is default, so also should be omitted:
end_date = "29-05-2021"
mask = df["date"] == pd.to_datetime(end_date)
result_df = pd.merge(df,df.loc[mask, ["names","subject"]])
print (result_df)
names subject date marks
0 A X 2021-05-15 10
1 A X 2021-05-22 12
2 A X 2021-05-29 11
3 B P 2021-05-22 17
4 B P 2021-05-29 16
5 B Q 2021-05-29 14
Alternative solution with GroupBy.transform
is a bit slowier if large DataFrame
s:
end_date = "29-05-2021"
mask = df["date"] == pd.to_datetime(end_date)
df = df[df.assign(m = mask).groupby(['names','subject'])['m'].transform('any')]
Upvotes: 1
Reputation: 26676
Just groupby, and filter any date that has your specs.
df.groupby(['names','subject']).filter(lambda x:(x['date']=='29-05-2021').any())
Upvotes: 1