Reputation: 55
I would like to drop some rows that meets certain conditions but I do not want to drop the first row even if the first row meets that criteria.
I tried dropping rows by using the df.drop function but it will erase the first row if the first row meets that condition. I do not want that.
Data looks something like this:
Column1 Column2 Column3
1 3 A
2 1 B
3 3 C
4 1 D
5 1 E
6 3 F
I want to do it in a way that if a row has a value of 3 in column2 then drop it.
And I want the new data to be like this (after dropping but keeping the first one even though the first row had a value of 3 in column 2):
Column1 Column2 Column3
1 3 A
2 1 B
4 1 D
5 1 E
Upvotes: 3
Views: 1862
Reputation: 402513
You can make "retain first row" a part of your condition for dropping/keeping rows.
The condition for keeping the rows would be <ORIGINAL CONDITION> or <CONDITION TO KEEP FIRST ROW>
. In code, this is
# (condition to drop 3) | (condition to keep 0th row)
df[(df['Column2'] != 3) | (df.index == 0)]
Column1 Column2 Column3
0 1 3 A
1 2 1 B
3 4 1 D
4 5 1 E
Inversely, using DeMorgan's Laws, the condition for dropping rows would be (df['Column2'] == 3) & (df.index != 0)
. We then invert the condition to get our expected output,
df[~((df['Column2'] == 3) & (df.index != 0))]
Column1 Column2 Column3
0 1 3 A
1 2 1 B
3 4 1 D
4 5 1 E
These work assuming your index is a RangeIndex
. If not, use pd.RangeIndex(len(df)) == 0
as the second condition instead.
Upvotes: 2
Reputation: 42916
Already great answers posted here, but just for your convenience. We can also use cumcount
for this to rank Nth
3 we find:
df = df[~((df.groupby('Column2').Column2.cumcount() != 0) & (df.Column2 == 3))]
print(df)
Column1 Column2 Column3
0 1 3 A
1 2 1 B
3 4 1 D
4 5 1 E
Upvotes: 2
Reputation: 323236
I am using duplicated
df[(~df.Column2.duplicated())|df.Column2.ne(3)]
Column1 Column2 Column3
0 1 3 A
1 2 1 B
3 4 1 D
4 5 1 E
Upvotes: 2