Bernando Purba
Bernando Purba

Reputation: 551

Calculates new columns based on other columns' values in python pandas dataframe

I want to create a new column based on the value of other columns in pandas dataframe. My data is about a truck that moves back and forth from loading to dumping location. I want calculates the distance of current road segment to the last segment. The example of the data shown below:

State      | segment length | 
-----------------------------
Loaded     |    20          |
Loaded     |    10          |
Loaded     |    10          |
Empty      |    15          |
Empty      |    10          |
Empty      |    10          |
Loaded     |    30          |
Loaded     |    20          |
Loaded     |    10          |

So, the end of the road will be the record where the State changes. Hence I want to calculate the distance from end of the road. The final dataframe will be:

State   | segment length | Distance to end
Loaded  |       20       |     40
Loaded  |       10       |     20
Loaded  |       10       |     10
Empty   |       15       |     35
Empty   |       10       |     20
Empty   |       10       |     10
Loaded  |       30       |     60
Loaded  |       20       |     30
Loaded  |       10       |     10

Can anyone help? Thank you in advance

Upvotes: 2

Views: 121

Answers (2)

Allen Qin
Allen Qin

Reputation: 19947

df['Distance to end'] = (
    df.assign(i=df.State.ne(df.State.shift()).cumsum())
    .assign(s=lambda x: x.groupby(by='i')['segment length'].transform(sum))
    .groupby(by='i')
    .apply(lambda x: x.s.sub(x['segment length'].shift().cumsum().fillna(0)))
    .values
)

    State   segment length  Distance to end
0   Loaded  20              40.0
1   Loaded  10              20.0
2   Loaded  10              10.0
3   Empty   15              35.0
4   Empty   10              20.0
5   Empty   10              10.0
6   Loaded  30              60.0
7   Loaded  20              30.0
8   Loaded  10              10.0

Upvotes: 0

jezrael
jezrael

Reputation: 862521

Use GroupBy.cumsum with DataFrame.iloc for swap ordering and custom Series for get unique consecutive groups with shift and cumsum:

g = df['State'].ne(df['State'].shift()).cumsum()
df['Distance to end'] = df.iloc[::-1].groupby(g)['segment length'].cumsum()
print (df)
    State  segment length  Distance to end
0  Loaded              20               40
1  Loaded              10               20
2  Loaded              10               10
3   Empty              15               35
4   Empty              10               20
5   Empty              10               10
6  Loaded              30               60
7  Loaded              20               30
8  Loaded              10               10

Detail:

print (g)
0    1
1    1
2    1
3    2
4    2
5    2
6    3
7    3
8    3
Name: State, dtype: int32

Upvotes: 4

Related Questions