darkpool
darkpool

Reputation: 14641

Pandas - Flag when one item is different

I have the following dataframe:

date,         item,         value
2015-01-20    abc            1
2015-01-20    xyz           -1
2015-01-20    fgh            1
2015-01-20    iop           -1
2015-01-21    abc            1
2015-01-21    xyz            1
2015-01-21    fgh           -1
2015-01-21    iop            1

I need to go through each day and flag an item if it is the only item for that day which has a different value to the rest. They could have any values, the point is to identify when there is only one item that differs from the rest for each day.

For example on 2015-01-20 two items have value 1 and two items have value -1, therefore there is nothing to flag. However on 2015-01-21, only one item has value -1 while the rest have value 1 therefore I need to flag item 'fgh'.

The output of the above would therefore be:

date,         item,         value    flag
2015-01-20    abc            1
2015-01-20    xyz           -1
2015-01-20    fgh            1
2015-01-20    iop           -1
2015-01-21    abc            1
2015-01-21    xyz            1
2015-01-21    fgh           -1       True
2015-01-21    iop            1

Im not aware of anything built into pandas to do this? If not, is there a reasonably elegant way to accomplish this.

Edit

To simplify, I would be able to ensure that the value column is either positive or negative meaning that I would simply need to identify when one value is positive while the rest are negative, or vice-versa.

Upvotes: 2

Views: 879

Answers (1)

maxymoo
maxymoo

Reputation: 36545

You can do it in two steps:

First use a groupby to get the aggregate function that you want, in this case a flag against each date and value:

singleton_values = (df.groupby(['date,','value'])['item,'].count()==1)
singleton_values = singleton_values.reset_index()
singleton_values.rename(columns = {'item,':'flag'}, inplace= True)        
singleton_values

        date,  value   flag
0  2015-01-20     -1  False
1  2015-01-20      1  False
2  2015-01-21     -1   True
3  2015-01-21      1  False

Then merge back to the original dataframe to have your aggregates displayed against each row:

df = pd.merge(df,singleton_values)
df

        date, item,  value   flag
0  2015-01-20   abc      1  False
1  2015-01-20   fgh      1  False
2  2015-01-20   xyz     -1  False
3  2015-01-20   iop     -1  False
4  2015-01-21   abc      1  False
5  2015-01-21   xyz      1  False
6  2015-01-21   iop      1  False
7  2015-01-21   fgh     -1   True

Upvotes: 1

Related Questions