eemilk
eemilk

Reputation: 1628

Keep first occurrence of value pattern in column

I have a dataframe like this:

import pandas as pd

time = ["2020-11-24 08:34:25.963422", "2020-11-24 08:34:25.963469", "2020-11-24 08:34:25.963681", "2020-11-24 08:34:27.051432", \
        "2020-11-24 08:34:27.051855", "2020-11-24 08:34:52.793429", "2020-11-24 08:34:52.793465", "2020-11-24 08:34:52.793641", \
        "2020-11-24 08:34:53.880143", "2020-11-24 08:34:53.880541", "2020-11-24 08:35:04.853417", "2020-11-24 08:35:04.853450", \
        "2020-11-24 08:35:04.853605"]

name = ["request", "request", "request", "complete", "complete", "request", "request", "request", "complete", "complete", "request", "request", "request"]

data = {"time": time, "name": name}

df = pd.DataFrame(data)


    time                        name
0   2020-11-24 08:34:25.963422  request
1   2020-11-24 08:34:25.963469  request
2   2020-11-24 08:34:25.963681  request
3   2020-11-24 08:34:27.051432  complete
4   2020-11-24 08:34:27.051855  complete
5   2020-11-24 08:34:52.793429  request
6   2020-11-24 08:34:52.793465  request
7   2020-11-24 08:34:52.793641  request
8   2020-11-24 08:34:53.880143  complete
9   2020-11-24 08:34:53.880541  complete
10  2020-11-24 08:35:04.853417  request
11  2020-11-24 08:35:04.853450  request
12  2020-11-24 08:35:04.853605  request

I want to keep first occurrence of the pattern request and complete so the output would look like this:

    time                        name
0   2020-11-24 08:34:25.963422  request
1   2020-11-24 08:34:27.051432  complete
2   2020-11-24 08:34:52.793429  request
3   2020-11-24 08:34:53.880143  complete
4   2020-11-24 08:35:04.853417  request

I already tried to use iloc and slicing the dataframe but not succeeded to get anything useful. I could start looping and counting the occurrences row by row but I think there must be a more efficient method.

Upvotes: 1

Views: 84

Answers (1)

sacuL
sacuL

Reputation: 51335

Probably the easiest way is to search just for the rows where name doesn't equal the next row's name, and use loc to filter out the duplicate rows:

df.loc[df.name.ne(df.name.shift())]

                          time      name
0   2020-11-24 08:34:25.963422   request
3   2020-11-24 08:34:27.051432  complete
5   2020-11-24 08:34:52.793429   request
8   2020-11-24 08:34:53.880143  complete
10  2020-11-24 08:35:04.853417   request

Upvotes: 2

Related Questions