Simple way to identify concurrent events in pandas dataframe

Question

I am looking for a simple way to add a column to a dataframe that indicates if a given part has been purchased for at least two years in a row

Here is an example dataframe

df = pd.DataFrame(
    {
        'PART_UNIT':['A','A','A','B','B'],
        'FiscalYear':['2015/2016','2016/2017','2018/2019','2015/2016','2017/2018'],
        'PO_UNIT_PRICE':[10,12,11,45,54]
    }
)
print(df)

0         A  2015/2016             10
1         A  2016/2017             12
2         A  2018/2019             11
3         B  2015/2016             45
4         B  2017/2018             54

I am looking for a function like the one I would use when adding a standard deviation column

df['std'] = df.groupby(['PART_UNIT'])['PO_UNIT_PRICE'].transform(np.std)

to get a result like this

  PART_UNIT FiscalYear  PO_UNIT_PRICE  Concurrent
0         A  2015/2016             10           1
1         A  2016/2017             12           1
2         A  2018/2019             11           1
3         B  2015/2016             45           0
4         B  2017/2018             54           0

As you can see part "B" gets a 0 in the column, as it has not been purchased two years in a row.

balderman · Accepted Answer

import pandas as pd

df = pd.DataFrame(
    {
        'PART_UNIT': ['A', 'A', 'A', 'B', 'B'],
        'FiscalYear': ['2015/2016', '2016/2017', '2018/2019', '2015/2016', '2017/2018'],
        'PO_UNIT_PRICE': [10, 12, 11, 45, 54]
    }
)

print(df)


def two_years_in_a_row(fiscal_years):
    tmp = list(fiscal_years)
    for idx, year in enumerate(tmp):
        if idx > 0:
            if tmp[idx - 1].split('/')[1] == year.split('/')[0]:
                return 1
    return 0


print('----------------------------------------')

df['concurrent'] = df.groupby(['PART_UNIT'])['FiscalYear'].transform(two_years_in_a_row)

print(df)

output

 PART_UNIT FiscalYear  PO_UNIT_PRICE
0         A  2015/2016             10
1         A  2016/2017             12
2         A  2018/2019             11
3         B  2015/2016             45
4         B  2017/2018             54
----------------------------------------
  PART_UNIT FiscalYear  PO_UNIT_PRICE  concurrent
0         A  2015/2016             10           1
1         A  2016/2017             12           1
2         A  2018/2019             11           1
3         B  2015/2016             45           0
4         B  2017/2018             54           0

Simple way to identify concurrent events in pandas dataframe

Answers (1)

Related Questions