simon
simon

Reputation: 690

Remove rows that are not relevant for my data set

Consider the following pandas dataframe which contains info about the usage of the public pool by a single user.

He can only use the service twice per week, so when he reaches 2 usages, the rest of the days within the same week are automatically 0. How can I delete this irrelevant rows, i.e., all the rows that fulfil df['Attended']==0 because the threshold has been reached?

Week Day Attended?
1     1      0
1     2      1
1     3      1
1     4      0
1     5      0
2     1      1
2     2      1
2     3      0
2     4      0
2     5      0
3     1      0
3     2      0
3     3      0
3     4      0
3     5      1

As an example, the expected output would delete days 4,5 from week1 as the threshold is reached on day 3. In the same manner, the threshold is reached on day 2 for week 2 so day 3,4 and 5 are deleted.

Week Day Attended?
1     1      0
1     2      1
1     3      1
2     1      1
2     2      1
3     1      0
3     2      0
3     3      0
3     4      0
3     5      1

Upvotes: 0

Views: 104

Answers (1)

ALollz
ALollz

Reputation: 59569

Form a Boolean mask with cumsum to find the 0s after the first 2 attendances. Subset with .loc

m = df.groupby('Week')['Attended?'].cumsum().ge(2) & df['Attended?'].eq(0)
df.loc[~m]

    Week  Day  Attended?
0      1    1          0
1      1    2          1
2      1    3          1
5      2    1          1
6      2    2          1
10     3    1          0
11     3    2          0
12     3    3          0
13     3    4          0
14     3    5          1

To do this with a variable threshold per week, as in your comments, we map then can use the same logic, checking against the created column, not a scalar value.

thresh = [1, 2, 2]  # Usages per week to drop after
week = [1, 2, 3]

df['threshold'] = df['Week'].map(dict(zip(week, thresh)))

m = df.groupby('Week')['Attended?'].cumsum().ge(df['threshold']) & df['Attended?'].eq(0)
df.loc[~m]

    Week  Day  Attended?  threshold
0      1    1          0          1
1      1    2          1          1
2      1    3          1          1
5      2    1          1          2
6      2    2          1          2
10     3    1          0          2
11     3    2          0          2
12     3    3          0          2
13     3    4          0          2
14     3    5          1          2

Upvotes: 6

Related Questions