E. Zeytinci
E. Zeytinci

Reputation: 2643

How do I compare the previous row and change the value of another column with group by in Pandas?

I have a dataframe like this,

>>> import pandas as pd
>>> data = {
    'code': ['123456', '123456', '123456', '123abc', '123abc', '123abc'],
    'price': [112.58, 112.58, 113.25, 120.31, 120.36, 140.12],
}... ... ...
>>> all_df = pd.DataFrame(data)
>>> all_df
     code   price
0  123456  112.58
1  123456  112.58
2  123456  113.25
3  123abc  120.31
4  123abc  120.36
5  123abc  140.12

I want to create a column named period and increase the period when the price is not the same value. And each transaction should be made according to the unique code. Here is expected output,

     code   price  period
0  123456  112.58     1.0
1  123456  112.58     1.0
2  123456  113.25     2.0
3  123abc  112.58     1.0
4  123abc  112.58     1.0
5  123abc  113.25     2.0

I managed to do it this way, but it takes too long when the data is too much.

def get_period(df):
    period = df.loc[0, 'period'] = 1
    df = df.reset_index(drop=True)

    for i in range(1, len(df)):
        if df.loc[i, 'price'] != df.loc[i - 1, 'price']:
            df.loc[i, 'period'] = period + 1
            period += 1
        else:
            df.loc[i, 'period'] = period

    return df

all_df.groupby('code').apply(get_period).reset_index(drop=True)

Any idea? Thanks in advance.

Upvotes: 2

Views: 94

Answers (1)

jezrael
jezrael

Reputation: 862581

First are tested consecutive price values by Series.shift with Series.cumsum and then for each value call factorize per groups:

df['period'] = (df.assign(new = df['price'].ne(df['price'].shift()).cumsum())
                  .groupby('code')['new']
                  .transform(lambda x: pd.factorize(x)[0]) + 1)
print (df)
     code   price  period
0  123456  112.58       1
1  123456  112.58       1
2  123456  113.25       2
3  123abc  112.58       1
4  123abc  112.58       1
5  123abc  113.25       2

Upvotes: 3

Related Questions