DDM
DDM

Reputation: 323

How to create new column that counts and reset based on a string value in another column

I have a dataframe that goes like this

    A               B       C       B_shifted   C_shifted   Trend
0   553.666667      533.50  574.00  NaN         NaN         Flat
1   590.818182      575.50  595.50  533.50      574.00      Up
2   531.333333      527.50  536.50  575.50      595.50      Down
3   562.000000      562.00  562.00  527.50      536.50      Up
4   551.857143      538.50  557.50  562.00      562.00      Down
5   592.000000      585.50  598.50  538.50      557.50      Up
6   511.000000      511.00  511.00  585.50      598.50      Down
7   564.333333      548.00  590.50  511.00      511.00      Up
8   574.333333      552.00  580.00  548.00      590.50      Flat
9   537.500000      513.25  574.50  552.00      580.00      Down
10  609.500000      582.25  636.75  513.25      574.50      Up
11  535.000000      531.00  565.00  582.25      636.75      Down
12  567.142857      539.50  588.50  531.00      565.00      Up
13  566.625000      546.25  594.25  539.50      588.50      Up
14  576.631579      556.00  598.00  546.25      594.25      Up
15  558.333333      538.00  584.00  556.00      598.00      Down

I am trying to create a column to count the number of times the trend happens. It will increment if the same trend persist (e.g. from Up to Up, Count becomes 2) and reset it back to 1 if the trend changes (e.g. from Up to down, Count becomes 1 again)

This is what I am trying to achieve

    A               B       C       B_shifted   C_shifted   Trend   Counter
0   553.666667      533.50  574.00  NaN         NaN         Flat    1
1   590.818182      575.50  595.50  533.50      574.00      Up      1
2   531.333333      527.50  536.50  575.50      595.50      Down    1
3   562.000000      562.00  562.00  527.50      536.50      Up      1
4   551.857143      538.50  557.50  562.00      562.00      Down    1
5   592.000000      585.50  598.50  538.50      557.50      Up      1
6   511.000000      511.00  511.00  585.50      598.50      Down    1
7   564.333333      548.00  590.50  511.00      511.00      Up      1
8   574.333333      552.00  580.00  548.00      590.50      Flat    1
9   537.500000      513.25  574.50  552.00      580.00      Down    1
10  609.500000      582.25  636.75  513.25      574.50      Up      1
11  535.000000      531.00  565.00  582.25      636.75      Down    1
12  567.142857      539.50  588.50  531.00      565.00      Up      1
13  566.625000      546.25  594.25  539.50      588.50      Up      2
14  576.631579      556.00  598.00  546.25      594.25      Up      3
15  558.333333      538.00  584.00  556.00      598.00      Down    1

Is there anyway to do that?

Upvotes: 0

Views: 813

Answers (2)

user15512272
user15512272

Reputation:

You can shift() the Trend column to get trending indexes and then cumsum() within the trending groups:

trending = df.Trend.eq(df.Trend.shift())
df['Counter'] = trending.groupby(trending).cumsum().add(1)

Output:

             A       B       C  B_shifted  C_shifted Trend  Counter
0   553.666667  533.50  574.00        NaN        NaN  Flat        1
1   590.818182  575.50  595.50     533.50     574.00    Up        1
2   531.333333  527.50  536.50     575.50     595.50  Down        1
3   562.000000  562.00  562.00     527.50     536.50    Up        1
4   551.857143  538.50  557.50     562.00     562.00  Down        1
5   592.000000  585.50  598.50     538.50     557.50    Up        1
6   511.000000  511.00  511.00     585.50     598.50  Down        1
7   564.333333  548.00  590.50     511.00     511.00    Up        1
8   574.333333  552.00  580.00     548.00     590.50  Flat        1
9   537.500000  513.25  574.50     552.00     580.00  Down        1
10  609.500000  582.25  636.75     513.25     574.50    Up        1
11  535.000000  531.00  565.00     582.25     636.75  Down        1
12  567.142857  539.50  588.50     531.00     565.00    Up        1
13  566.625000  546.25  594.25     539.50     588.50    Up        2
14  576.631579  556.00  598.00     546.25     594.25    Up        3
15  558.333333  538.00  584.00     556.00     598.00  Down        1

Upvotes: 2

sophocles
sophocles

Reputation: 13821

You can use shift() to compare values with previous values, and if the values are the same you can use cumsum().add(1) (as the default would be to start from 0), wrapped up in numpy's where function.

i = df.Trend
df['Count'] = np.where(i.eq(i.shift()), i.eq(i.shift()).astype(int).cumsum().add(1),1)

Output

df

             A       B       C B_shifted C_shifted Trend  Count
0   553.666667  533.50  574.00       NaN       NaN  Flat      1
1   590.818182  575.50  595.50    533.50       574    Up      1
2   531.333333  527.50  536.50    575.50    595.50  Down      1
3   562.000000  562.00  562.00    527.50    536.50    Up      1
4   551.857143  538.50  557.50       562       562  Down      1
5   592.000000  585.50  598.50    538.50    557.50    Up      1
6   511.000000  511.00  511.00    585.50    598.50  Down      1
7   564.333333  548.00  590.50       511       511    Up      1
8   574.333333  552.00  580.00       548    590.50  Flat      1
9   537.500000  513.25  574.50       552       580  Down      1
10  609.500000  582.25  636.75    513.25    574.50    Up      1
11  535.000000  531.00  565.00    582.25    636.75  Down      1
12  567.142857  539.50  588.50       531       565    Up      1
13  566.625000  546.25  594.25    539.50    588.50    Up      2
14  576.631579  556.00  598.00    546.25    594.25    Up      3
15  558.333333  538.00  584.00       556       598  Down      1

Upvotes: 0

Related Questions