Reputation: 41
I am trying to keep the first row for each unique date for Mondays, but keep the last row for each unique date for Fridays.
I tried adding an indicator column where Monday = 1 and Friday = 2.
I can't seem to figure out what would be the best way to go about this? I tried using drop_duplicates
but I only know how to use it to select columns for which duplicates will be checked against but I can't figure out how to toggle the keep
between first
and last
depending on the value of "Day" in each row.
year month day Time Close Date Day %Delta MFIndicator
2926 2001 1 5 0 0.9525 2001-01-05 Friday -0.000734 2
2985 2001 1 5 1 0.9534 2001-01-05 Friday 0.000945 2
3045 2001 1 5 2 0.9534 2001-01-05 Friday 0.000000 2
3105 2001 1 5 3 0.9541 2001-01-05 Friday 0.000734 2
3165 2001 1 5 4 0.9574 2001-01-05 Friday 0.003459 2
... ... ... ... ... ... ... ... ... ...
6700166 2019 12 30 19 1.1200 2019-12-30 Monday -0.000179 1
6700226 2019 12 30 20 1.1195 2019-12-30 Monday -0.000446 1
6700286 2019 12 30 21 1.1198 2019-12-30 Monday 0.000268 1
6700346 2019 12 30 22 1.1197 2019-12-30 Monday -0.000089 1
6700406 2019 12 30 23 1.1200 2019-12-30 Monday 0.000268 1
This is the expected result: (XXXX,YYYY are some index but I think the picture is clear)
year month day Time Close Date Day %Delta MFIndicator
XXXX 2001 1 5 23 0.9574 2001-01-05 Friday 0.003459 2
... ... ... ... ... ... ... ... ... ...
YYYY 2019 12 30 0 0.1200 2019-12-30 Monday -0.000179 1
Upvotes: 1
Views: 584
Reputation: 1065
Keep all, then postprocess within a groupby, kind of like this:
dup_mask = df.duplicated(subset=['Date'], keep=False)
df_dup = df[dup_mask].groupby('Date', as_index=False).apply(lambda gr: gr.iloc[0] if gr['Day'].iloc[0] == 'Monday' else gr.iloc[-1])
df = pd.concat(df[~dup_mask], df_dup)
Upvotes: 0
Reputation: 13349
df = pd.DataFrame({'A': ['m', 'f', 'm', 'm', 'f', 'f'],
'B': np.arange(6), 'C': np.arange(6)})
A B C
0 m 0 0
1 f 1 1
2 m 2 2
3 m 3 3
4 f 4 4
5 f 5 5
(df.groupby('A', as_index=False).apply(lambda x: x.iloc[[0]] if x['A'].values[0]=='m' else x.iloc[[-1]])
.reset_index(level=0, drop=True))
A B C
4 f 4 4
0 m 0 0
Logic will be same. In place of A you will have Day column and you will do groupby Date column.
Upvotes: 1