Tie_24
Tie_24

Reputation: 647

Select rows in groups with conditions

             A    D          E
0   2002-01-13  3.3    Tuesday
1   2002-01-13  3.9  Wednesday
2   2002-01-13  1.9   Thursday
3   2002-01-13  9.0   Saturday
4   2002-01-14  0.9    Tuesday
5   2002-01-14  0.2  Wednesday
6   2002-01-14  5.1   Thursday
7   2002-01-14  7.0     Friday
8   2002-01-14  1.9   Saturday
9   2002-01-15  4.2    Tuesday
10  2002-01-15  6.7  Wednesday
11  2002-01-15  1.2     Friday
12  2002-01-15  0.6   Saturday

I have a data frame above sorted by date (column A). Column E holds day-of-week references. I need a new df, filtering rows by the following conditions.

Output should be:

             A    D          E
0   2002-01-13  3.3    Tuesday
1   2002-01-13  3.9  Wednesday
2   2002-01-13  1.9   Thursday
3   2002-01-13  9.0   Saturday
5   2002-01-14  0.2  Wednesday
6   2002-01-14  5.1   Thursday
7   2002-01-14  7.0     Friday
8   2002-01-14  1.9   Saturday
9   2002-01-15  4.2    Tuesday
10  2002-01-15  6.7  Wednesday
11  2002-01-15  1.2     Friday
12  2002-01-15  0.6   Saturday

I´ve tried:

m1 = (group["E"] == "Wednesday")
m2 = (group["E"] == "Thursday")

grouped = df.groupby("A")

for idx, group in grouped:

    if (m1|m2).any():
        df[idx] = group[m1|m2]
    else:
        df[idx] = group[m2]

Upvotes: 0

Views: 54

Answers (2)

Scott Boston
Scott Boston

Reputation: 153500

You can try this using np.where and isin:

df[df.groupby('A')['E']
     .transform(lambda x: np.where(x.eq('Thursday').any() & x.eq('Friday').any(),
                               x.isin(['Wednesday','Thursday','Friday','Saturday']),
                               x.isin(['Tuesday','Wednesday','Thursday','Friday','Saturday'])))]

Output:

             A    D          E
0   2002-01-13  3.3    Tuesday
1   2002-01-13  3.9  Wednesday
2   2002-01-13  1.9   Thursday
3   2002-01-13  9.0   Saturday
5   2002-01-14  0.2  Wednesday
6   2002-01-14  5.1   Thursday
7   2002-01-14  7.0     Friday
8   2002-01-14  1.9   Saturday
9   2002-01-15  4.2    Tuesday
10  2002-01-15  6.7  Wednesday
11  2002-01-15  1.2     Friday
12  2002-01-15  0.6   Saturday

Upvotes: 1

YOLO
YOLO

Reputation: 21739

Using a simple custom function on groupby gets the answer:

# helper function
def get_grp(grp):

    # values to consider
    a = ['Thursday', 'Friday']

    # set up condition
    cond = all(True if i in grp['E'].tolist() else False for i in a)

    # check conditions
    if cond:
        return grp[grp['E'].isin(['Wednesday','Thursday','Friday','Saturday'])]
    else:
        return grp[grp['E'].isin(['Tuesday','Wednesday','Thursday','Friday','Saturday'])]

## apply function on groupby object

df = df.groupby('A',as_index=False).apply(get_grp).reset_index(drop=True)

## output
df.head(7)

    A   D   E
0   2002-01-13  3.3     Tuesday
1   2002-01-13  3.9     Wednesday
2   2002-01-13  1.9     Thursday
3   2002-01-13  9.0     Saturday
4   2002-01-14  0.2     Wednesday
5   2002-01-14  5.1     Thursday
6   2002-01-14  7.0     Friday
7   2002-01-14  1.9     Saturday

Upvotes: 1

Related Questions