Reputation: 995
I am looking for a simple way to add a column to a dataframe that indicates if a given part has been purchased for at least two years in a row
Here is an example dataframe
df = pd.DataFrame(
{
'PART_UNIT':['A','A','A','B','B'],
'FiscalYear':['2015/2016','2016/2017','2018/2019','2015/2016','2017/2018'],
'PO_UNIT_PRICE':[10,12,11,45,54]
}
)
print(df)
0 A 2015/2016 10
1 A 2016/2017 12
2 A 2018/2019 11
3 B 2015/2016 45
4 B 2017/2018 54
I am looking for a function like the one I would use when adding a standard deviation column
df['std'] = df.groupby(['PART_UNIT'])['PO_UNIT_PRICE'].transform(np.std)
to get a result like this
PART_UNIT FiscalYear PO_UNIT_PRICE Concurrent
0 A 2015/2016 10 1
1 A 2016/2017 12 1
2 A 2018/2019 11 1
3 B 2015/2016 45 0
4 B 2017/2018 54 0
As you can see part "B" gets a 0 in the column, as it has not been purchased two years in a row.
Upvotes: 2
Views: 281
Reputation: 23815
import pandas as pd
df = pd.DataFrame(
{
'PART_UNIT': ['A', 'A', 'A', 'B', 'B'],
'FiscalYear': ['2015/2016', '2016/2017', '2018/2019', '2015/2016', '2017/2018'],
'PO_UNIT_PRICE': [10, 12, 11, 45, 54]
}
)
print(df)
def two_years_in_a_row(fiscal_years):
tmp = list(fiscal_years)
for idx, year in enumerate(tmp):
if idx > 0:
if tmp[idx - 1].split('/')[1] == year.split('/')[0]:
return 1
return 0
print('----------------------------------------')
df['concurrent'] = df.groupby(['PART_UNIT'])['FiscalYear'].transform(two_years_in_a_row)
print(df)
output
PART_UNIT FiscalYear PO_UNIT_PRICE
0 A 2015/2016 10
1 A 2016/2017 12
2 A 2018/2019 11
3 B 2015/2016 45
4 B 2017/2018 54
----------------------------------------
PART_UNIT FiscalYear PO_UNIT_PRICE concurrent
0 A 2015/2016 10 1
1 A 2016/2017 12 1
2 A 2018/2019 11 1
3 B 2015/2016 45 0
4 B 2017/2018 54 0
Upvotes: 1