Reputation: 551
I want to create a new column based on the value of other columns in pandas dataframe. My data is about a truck that moves back and forth from loading to dumping location. I want calculates the distance of current road segment to the last segment. The example of the data shown below:
State | segment length |
-----------------------------
Loaded | 20 |
Loaded | 10 |
Loaded | 10 |
Empty | 15 |
Empty | 10 |
Empty | 10 |
Loaded | 30 |
Loaded | 20 |
Loaded | 10 |
So, the end of the road will be the record where the State changes. Hence I want to calculate the distance from end of the road. The final dataframe will be:
State | segment length | Distance to end
Loaded | 20 | 40
Loaded | 10 | 20
Loaded | 10 | 10
Empty | 15 | 35
Empty | 10 | 20
Empty | 10 | 10
Loaded | 30 | 60
Loaded | 20 | 30
Loaded | 10 | 10
Can anyone help? Thank you in advance
Upvotes: 2
Views: 121
Reputation: 19947
df['Distance to end'] = (
df.assign(i=df.State.ne(df.State.shift()).cumsum())
.assign(s=lambda x: x.groupby(by='i')['segment length'].transform(sum))
.groupby(by='i')
.apply(lambda x: x.s.sub(x['segment length'].shift().cumsum().fillna(0)))
.values
)
State segment length Distance to end
0 Loaded 20 40.0
1 Loaded 10 20.0
2 Loaded 10 10.0
3 Empty 15 35.0
4 Empty 10 20.0
5 Empty 10 10.0
6 Loaded 30 60.0
7 Loaded 20 30.0
8 Loaded 10 10.0
Upvotes: 0
Reputation: 862521
Use GroupBy.cumsum
with DataFrame.iloc
for swap ordering and custom Series
for get unique consecutive groups with shift
and cumsum
:
g = df['State'].ne(df['State'].shift()).cumsum()
df['Distance to end'] = df.iloc[::-1].groupby(g)['segment length'].cumsum()
print (df)
State segment length Distance to end
0 Loaded 20 40
1 Loaded 10 20
2 Loaded 10 10
3 Empty 15 35
4 Empty 10 20
5 Empty 10 10
6 Loaded 30 60
7 Loaded 20 30
8 Loaded 10 10
Detail:
print (g)
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
8 3
Name: State, dtype: int32
Upvotes: 4