Reputation: 2643
I have a dataframe like this,
>>> import pandas as pd
>>> data = {
'code': ['123456', '123456', '123456', '123abc', '123abc', '123abc'],
'price': [112.58, 112.58, 113.25, 120.31, 120.36, 140.12],
}... ... ...
>>> all_df = pd.DataFrame(data)
>>> all_df
code price
0 123456 112.58
1 123456 112.58
2 123456 113.25
3 123abc 120.31
4 123abc 120.36
5 123abc 140.12
I want to create a column named period and increase the period when the price is not the same value. And each transaction should be made according to the unique code. Here is expected output,
code price period
0 123456 112.58 1.0
1 123456 112.58 1.0
2 123456 113.25 2.0
3 123abc 112.58 1.0
4 123abc 112.58 1.0
5 123abc 113.25 2.0
I managed to do it this way, but it takes too long when the data is too much.
def get_period(df):
period = df.loc[0, 'period'] = 1
df = df.reset_index(drop=True)
for i in range(1, len(df)):
if df.loc[i, 'price'] != df.loc[i - 1, 'price']:
df.loc[i, 'period'] = period + 1
period += 1
else:
df.loc[i, 'period'] = period
return df
all_df.groupby('code').apply(get_period).reset_index(drop=True)
Any idea? Thanks in advance.
Upvotes: 2
Views: 94
Reputation: 862581
First are tested consecutive price
values by Series.shift
with Series.cumsum
and then for each value call factorize
per groups:
df['period'] = (df.assign(new = df['price'].ne(df['price'].shift()).cumsum())
.groupby('code')['new']
.transform(lambda x: pd.factorize(x)[0]) + 1)
print (df)
code price period
0 123456 112.58 1
1 123456 112.58 1
2 123456 113.25 2
3 123abc 112.58 1
4 123abc 112.58 1
5 123abc 113.25 2
Upvotes: 3