Reputation: 71
I have a Pandas dataframe with three columns like that :
Time | Code | Id |
---|---|---|
10:10:00 | Rx | 11 |
10:10:01 | Tx | 11 |
10:10:02 | Rx | 12 |
10:10:04 | Tx | 12 |
10:10:06 | Rx | 13 |
10:10:07 | Tx | 13 |
10:10:08 | Rx | 11 |
10:10:10 | Rx | 11 |
I want to check if for a Rx
code if there is a Tx
code just after and if the id
is same for the Rx
and Tx
.
I want to get the row of duplicate Rx
if there is.
In my example I want to throw the 10:10:10 Rx because it's duplicated.
I managed to do with for loop but I should'nt use for loop with Data Frame
old_cell = None
for index, row in pdo_df.iterrows():
if old_cell is None:
old_cell = row
if row['Function_code'] == old_cell['Function_code']:
print("----------------")
print("Error :")
print(old_cell)
print(row)
print("----------------")
old_cell = row
My ideal output would be :
Time | Code | Id |
---|---|---|
10:10:08 | Rx | 11 |
Because this Rx
message is duplicated. (there is no Tx
after)
Upvotes: 0
Views: 142
Reputation: 944
The method shift help you look at the value of the last row. This code detect then all the duplicates :
df[
(df["Code"] == df["Code"].shift()) &
(df["Id"] == df["Id"].shift())
]
Following the same logic, if we take the opposite of the last code, you have your dataframe without those duplicates :
df[
~((df["Code"] == df["Code"].shift()) &
(df["Id"] == df["Id"].shift()) )
]
Upvotes: 2