Python Pandas check if a value occurs more then once in the same day

Question

I have a Pandas dataframe as below. What I am trying to do is check if a station has variable yyy and any other variable on the same day (as in the case of station1). If this is true I need to delete the whole row containing yyy.

Currently I am doing this using iterrows() and looping to search the days in which this variable appears, changing the variable to something like "delete me", building a new dataframe from this (because pandas doesn't support replacing in place) and filtering the new dataframe to get rid of the unwanted rows. This works now because my dataframes are small, but is not likely to scale.

Question: This seems like a very "non-Pandas" way to do this, is there some other method of deleting out the unwanted variables?

                dateuse         station         variable1
0   2012-08-12 00:00:00        station1               xxx
1   2012-08-12 00:00:00        station1               yyy
2   2012-08-23 00:00:00        station2               aaa
3   2012-08-23 00:00:00        station3               bbb
4   2012-08-25 00:00:00        station4               ccc
5   2012-08-25 00:00:00        station4               ccc
6   2012-08-25 00:00:00        station4               ccc

DSM · Accepted Answer

I might index using a boolean array. We want to delete rows (if I understand what you're after, anyway!) which have yyy and more than one dateuse/station combination.

We can use transform to broadcast the size of each dateuse/station combination up to the length of the dataframe, and then select the rows in groups which have length > 1. Then we can & this with where the yyys are.

>>> multiple = df.groupby(["dateuse", "station"])["variable1"].transform(len) > 1
>>> must_be_isolated = df["variable1"] == "yyy"
>>> df[~(multiple & must_be_isolated)]
               dateuse   station variable1
0  2012-08-12 00:00:00  station1       xxx
2  2012-08-23 00:00:00  station2       aaa
3  2012-08-23 00:00:00  station3       bbb
4  2012-08-25 00:00:00  station4       ccc
5  2012-08-25 00:00:00  station4       ccc
6  2012-08-25 00:00:00  station4       ccc

Python Pandas check if a value occurs more then once in the same day

Answers (1)

Related Questions