Reputation:
Could you kindly help me to write the following concept in python pandas, I have the following datatype:
id=["Train A","Train A","Train A","Train B","Train B","Train B"]
start = ["A","B","C","D","E","F"]
end = ["G","H","I","J","K","L"]
arrival_time = ["0"," 2016-05-19 13:50:00","2016-05-19 21:25:00","0","2016-05-24 18:30:00","2016-05-26 12:15:00"]
departure_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:25:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]
capacity = ["2","2","3","3","2","3"]
To obtain the following data:
id arrival_time departure_time start end capacity
Train A 0 2016-05-19 08:25:00 A G 2
Train A 2016-05-19 13:50:00 2016-05-19 16:00:00 B H 2
Train A 2016-05-19 21:25:00 2016-05-20 07:25:00 C I 3
Train B 0 2016-05-24 12:50:00 D J 3
Train B 2016-05-24 18:30:00 2016-05-25 20:00:00 E K 2
Train B 2016-05-26 12:15:00 2016-05-26 19:45:00 F L 3
I would like to add a column called source and sink and if the time difference between arrival and departure is less than 3 hours, the source is the starting of the trip and the sink is only when the trip breaks (ie when time_difference is more than 3 hours,
time difference source sink
- A H
02:10:00 A H
10:00:00 C I
- D K
01:30:00 D K
19:30:00 F L
Upvotes: 2
Views: 320
Reputation: 153460
df = df.assign(timediff=(df.departure_time - df.arrival_time))
df = df.assign(source = np.where(df.timediff.dt.seconds / 3600 < 3, df.shift(1).start, df.start))
df = df.assign(sink = np.where(df.timediff.dt.seconds.shift(1) / 3600 > 3, df.shift(-1).end, df.end))
print(df)
Output:
id arrival_time departure_time start end capacity sink \
0 Train A NaT 2016-05-19 08:25:00 A G 2 G
1 Train A 2016-05-19 13:50:00 2016-05-19 16:00:00 B H 2 H
2 Train A 2016-05-19 21:25:00 2016-05-20 07:25:00 C I 3 I
3 Train B NaT 2016-05-24 12:50:00 D J 3 K
4 Train B 2016-05-24 18:30:00 2016-05-25 20:00:00 E K 2 K
5 Train B 2016-05-26 12:15:00 2016-05-26 19:45:00 F L 3 L
timediff source
0 NaT A
1 0 days 02:10:00 A
2 0 days 10:00:00 C
3 NaT D
4 1 days 01:30:00 D
5 0 days 07:30:00 F
Upvotes: 2