Reputation: 575
I have a dataframe as follows:
id | time
1 | 10:21
1 | 10:22
1 | 10:23
2 | 10:40
2 | 10:45
2 | 10:50
I would like to add a new column as follows:
id | time | new_time
1 | 10:21 | 10:22
1 | 10:22 | 10:23
1 | 10:23 | None
2 | 10:40 | 10:45
2 | 10:45 | 10:50
2 | 10:50 | None
That means, I would like to create the new column by matching the values in the id
column. For example, if the id
values for two consecutive rows are the same then I would like to add the value in the new_time
(in the first column) from the time
value of the second column. If the id
values are not the same then I would like to add None
for the new_time
value.How can I achieve this using python or pandas?
Upvotes: 5
Views: 1944
Reputation: 731
Assuming ID is currently the index, I would try:
df = df.reset_index()
shifted = df.shift(-1)
df['new_time'] = shifted.time
df.loc[df.id != shifted.id, "new_time"] = None
df = df.set_index("id")
If it's not an index you can just skip the reset_index
and set_index
lines.
It basically shifts the entire dataframe, matches them up together and wherever IDs are no longer the same it sets those values to None.
Upvotes: 2
Reputation: 4757
Use .shift() to get next record:
df['new_time'] = df.shift(-1).time
Results:
id time new_time
0 1 10:21 10:22
1 1 10:22 10:23
2 1 10:23 10:40
3 2 10:40 10:45
4 2 10:45 10:50
5 2 10:50 NaN
Then assign np.NaN
to the last row for each id
df.loc[df.groupby('id', as_index= False).nth([-1]).index, 'new_time'] = np.NaN
Results:
id time new_time
0 1 10:21 10:22
1 1 10:22 10:23
2 1 10:23 NaN
3 2 10:40 10:45
4 2 10:45 10:50
5 2 10:50 NaN
Upvotes: 6