Reputation: 1628
I have a dataframe like this:
import pandas as pd
time = ["2020-11-24 08:34:25.963422", "2020-11-24 08:34:25.963469", "2020-11-24 08:34:25.963681", "2020-11-24 08:34:27.051432", \
"2020-11-24 08:34:27.051855", "2020-11-24 08:34:52.793429", "2020-11-24 08:34:52.793465", "2020-11-24 08:34:52.793641", \
"2020-11-24 08:34:53.880143", "2020-11-24 08:34:53.880541", "2020-11-24 08:35:04.853417", "2020-11-24 08:35:04.853450", \
"2020-11-24 08:35:04.853605"]
name = ["request", "request", "request", "complete", "complete", "request", "request", "request", "complete", "complete", "request", "request", "request"]
data = {"time": time, "name": name}
df = pd.DataFrame(data)
time name
0 2020-11-24 08:34:25.963422 request
1 2020-11-24 08:34:25.963469 request
2 2020-11-24 08:34:25.963681 request
3 2020-11-24 08:34:27.051432 complete
4 2020-11-24 08:34:27.051855 complete
5 2020-11-24 08:34:52.793429 request
6 2020-11-24 08:34:52.793465 request
7 2020-11-24 08:34:52.793641 request
8 2020-11-24 08:34:53.880143 complete
9 2020-11-24 08:34:53.880541 complete
10 2020-11-24 08:35:04.853417 request
11 2020-11-24 08:35:04.853450 request
12 2020-11-24 08:35:04.853605 request
I want to keep first occurrence of the pattern request
and complete
so the output would look like this:
time name
0 2020-11-24 08:34:25.963422 request
1 2020-11-24 08:34:27.051432 complete
2 2020-11-24 08:34:52.793429 request
3 2020-11-24 08:34:53.880143 complete
4 2020-11-24 08:35:04.853417 request
I already tried to use iloc
and slicing the dataframe but not succeeded to get anything useful. I could start looping and counting the occurrences row by row but I think there must be a more efficient method.
Upvotes: 1
Views: 84
Reputation: 51335
Probably the easiest way is to search just for the rows where name
doesn't equal the next row's name
, and use loc
to filter out the duplicate rows:
df.loc[df.name.ne(df.name.shift())]
time name
0 2020-11-24 08:34:25.963422 request
3 2020-11-24 08:34:27.051432 complete
5 2020-11-24 08:34:52.793429 request
8 2020-11-24 08:34:53.880143 complete
10 2020-11-24 08:35:04.853417 request
Upvotes: 2