Reputation: 291
Imagine we have a dataframe:
num line
0 1 56
1 1 90
2 2 66
3 3 4
4 3 55
5 3 104
6 1 23
7 5 22
8 3 144
I want to remove the rows where specifically a 3 is repeated in the num column, and keep the first. So the two rows with repeating 1's in the num column should still be in the resulting DataFrame together with all the other columns.
What I have so far, which removes every double value, not only the 3's:
data.groupby((data['num'] != data['num'].shift()).cumsum().values).first()
Expected result or correct code:
num line
0 1 56
1 1 90
2 2 66
3 3 4
4 1 23
5 5 22
6 3 144
Upvotes: 1
Views: 61
Reputation: 862591
Use:
df = data[data['num'].ne(3) | data['num'].ne(data['num'].shift())]
print (df)
num line
0 1 56
1 1 90
2 2 66
3 3 4
6 1 23
7 5 22
8 3 144
Detail:
Compare for not equal:
print (data['num'].ne(3))
0 True
1 True
2 True
3 False
4 False
5 False
6 True
7 True
8 False
Name: num, dtype: bool
Compare by shifted values for first consecutive:
print (data['num'].ne(data['num'].shift()))
0 True
1 False
2 True
3 True
4 False
5 False
6 True
7 True
8 True
Name: num, dtype: bool
Chain by |
for bitwise OR
:
print (data['num'].ne(3) | data['num'].ne(data['num'].shift()))
0 True
1 True
2 True
3 True
4 False
5 False
6 True
7 True
8 True
Name: num, dtype: bool
Upvotes: 3
Reputation: 88236
You could use the bellow conditions in order to perform boolean indexation in the dataframe:
# True where num is 3
c1 = df['num'].eq(3)
# True where num is repeated
c2 = df['num'].eq(df['num'].shift(1))
# boolean indexation on df
df[(c1 & ~c2) | ~(c1)]
num line
0 1 56
1 1 90
2 2 66
3 3 4
6 1 23
7 5 22
8 3 144
Details
df.assign(is_3=c1, is_repeated=c2, filtered=(c1 & ~c2) | ~(c1))
num line is_3 is_repeated filtered
0 1 56 False False True
1 1 90 False True True
2 2 66 False False True
3 3 4 True False True
4 3 55 True True False
5 3 104 True True False
6 1 23 False False True
7 5 22 False False True
8 3 144 True False True
Upvotes: 2