Reputation: 647
A D E
0 2002-01-13 3.3 Tuesday
1 2002-01-13 3.9 Wednesday
2 2002-01-13 1.9 Thursday
3 2002-01-13 9.0 Saturday
4 2002-01-14 0.9 Tuesday
5 2002-01-14 0.2 Wednesday
6 2002-01-14 5.1 Thursday
7 2002-01-14 7.0 Friday
8 2002-01-14 1.9 Saturday
9 2002-01-15 4.2 Tuesday
10 2002-01-15 6.7 Wednesday
11 2002-01-15 1.2 Friday
12 2002-01-15 0.6 Saturday
I have a data frame above sorted by date (column A). Column E holds day-of-week references. I need a new df, filtering rows by the following conditions.
Output should be:
A D E
0 2002-01-13 3.3 Tuesday
1 2002-01-13 3.9 Wednesday
2 2002-01-13 1.9 Thursday
3 2002-01-13 9.0 Saturday
5 2002-01-14 0.2 Wednesday
6 2002-01-14 5.1 Thursday
7 2002-01-14 7.0 Friday
8 2002-01-14 1.9 Saturday
9 2002-01-15 4.2 Tuesday
10 2002-01-15 6.7 Wednesday
11 2002-01-15 1.2 Friday
12 2002-01-15 0.6 Saturday
I´ve tried:
m1 = (group["E"] == "Wednesday")
m2 = (group["E"] == "Thursday")
grouped = df.groupby("A")
for idx, group in grouped:
if (m1|m2).any():
df[idx] = group[m1|m2]
else:
df[idx] = group[m2]
Upvotes: 0
Views: 54
Reputation: 153500
You can try this using np.where
and isin
:
df[df.groupby('A')['E']
.transform(lambda x: np.where(x.eq('Thursday').any() & x.eq('Friday').any(),
x.isin(['Wednesday','Thursday','Friday','Saturday']),
x.isin(['Tuesday','Wednesday','Thursday','Friday','Saturday'])))]
Output:
A D E
0 2002-01-13 3.3 Tuesday
1 2002-01-13 3.9 Wednesday
2 2002-01-13 1.9 Thursday
3 2002-01-13 9.0 Saturday
5 2002-01-14 0.2 Wednesday
6 2002-01-14 5.1 Thursday
7 2002-01-14 7.0 Friday
8 2002-01-14 1.9 Saturday
9 2002-01-15 4.2 Tuesday
10 2002-01-15 6.7 Wednesday
11 2002-01-15 1.2 Friday
12 2002-01-15 0.6 Saturday
Upvotes: 1
Reputation: 21739
Using a simple custom function on groupby
gets the answer:
# helper function
def get_grp(grp):
# values to consider
a = ['Thursday', 'Friday']
# set up condition
cond = all(True if i in grp['E'].tolist() else False for i in a)
# check conditions
if cond:
return grp[grp['E'].isin(['Wednesday','Thursday','Friday','Saturday'])]
else:
return grp[grp['E'].isin(['Tuesday','Wednesday','Thursday','Friday','Saturday'])]
## apply function on groupby object
df = df.groupby('A',as_index=False).apply(get_grp).reset_index(drop=True)
## output
df.head(7)
A D E
0 2002-01-13 3.3 Tuesday
1 2002-01-13 3.9 Wednesday
2 2002-01-13 1.9 Thursday
3 2002-01-13 9.0 Saturday
4 2002-01-14 0.2 Wednesday
5 2002-01-14 5.1 Thursday
6 2002-01-14 7.0 Friday
7 2002-01-14 1.9 Saturday
Upvotes: 1