Reputation: 479
I have a pandas dataframe that looks like this:
import pandas as pd
d = {'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 'user': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',]}
df = pd.DataFrame(data=d)
df
value user
1 A
2 A
3 A
4 A
5 A
6 A
7 A
8 A
9 B
10 B
11 B
12 B
13 C
14 C
15 C
16 C
17 C
18 C
19 C
20 C
I would like to create a new column called value2
to this dataframe, populate it with the same values from the value
column, but with the rows shifted upwards by one row. The last row of each user
group should be removed too. In the example below you can see that e.g. value pair 8-9
is missing.
value value2 user
1 2 A
2 3 A
3 4 A
4 5 A
5 6 A
6 7 A
7 8 A
9 10 B
10 11 B
11 12 B
13 14 C
14 15 C
15 16 C
16 17 C
17 18 C
18 19 C
19 20 C
I know can shift the rows with pandas' shift()
function, but it still doesn't give me the desired output, as you can see below. The solution seems to be to remove the last row of per user
group.. and to change the order of the columns (the user column should be the third column). Any ideas how to improve this?
df['value2'] = df['value'].shift(-1, fill_value=0)
df
value user value2
1 A 2
2 A 3
3 A 4
4 A 5
5 A 6
6 A 7
7 A 8
8 A 9
9 B 10
10 B 11
11 B 12
12 B 13
13 C 14
14 C 15
15 C 16
16 C 17
17 C 18
18 C 19
19 C 20
20 C 0
Upvotes: 2
Views: 954
Reputation: 862661
Use DataFrameGroupBy.shift
and remove -1
rows:
df['next_node'] = df.groupby('user')['node'].shift(-1, fill_value=-1)
df = df[df['next_node'].ne(-1)]
print (df)
node user next_node
0 1 A 2
1 2 A 3
2 3 A 4
3 4 A 5
4 5 A 6
5 6 A 7
6 7 A 8
8 9 B 10
9 10 B 11
10 11 B 12
12 13 C 14
13 14 C 15
14 15 C 16
15 16 C 17
16 17 C 18
17 18 C 19
18 19 C 20
Or remove last rows by user
in Series.duplicated
:
df['next_node'] = df.groupby('user')['node'].shift(-1, fill_value=-1)
df = df[df['user'].duplicated(keep='last')]
Upvotes: 2