sampeterson
sampeterson

Reputation: 479

Shift values in pandas dataframe vertically

I have a pandas dataframe that looks like this:

import pandas as pd
d = {'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 'user': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',]}
df = pd.DataFrame(data=d)
df

    value   user
    1       A
    2       A
    3       A
    4       A
    5       A
    6       A
    7       A
    8       A
    9       B
    10      B
    11      B
    12      B
    13      C
    14      C
    15      C
    16      C
    17      C
    18      C
    19      C
    20      C

I would like to create a new column called value2 to this dataframe, populate it with the same values from the value column, but with the rows shifted upwards by one row. The last row of each user group should be removed too. In the example below you can see that e.g. value pair 8-9 is missing.

    value  value2      user
    1      2           A
    2      3           A
    3      4           A
    4      5           A
    5      6           A
    6      7           A
    7      8           A
    9      10          B
    10     11          B
    11     12          B
    13     14          C
    14     15          C
    15     16          C
    16     17          C
    17     18          C
    18     19          C
    19     20          C

I know can shift the rows with pandas' shift() function, but it still doesn't give me the desired output, as you can see below. The solution seems to be to remove the last row of per user group.. and to change the order of the columns (the user column should be the third column). Any ideas how to improve this?

df['value2'] = df['value'].shift(-1, fill_value=0)
df


    value   user    value2
    1       A       2
    2       A       3
    3       A       4
    4       A       5
    5       A       6
    6       A       7
    7       A       8
    8       A       9
    9       B       10
    10      B       11
    11      B       12
    12      B       13
    13      C       14
    14      C       15
    15      C       16
    16      C       17
    17      C       18
    18      C       19
    19      C       20
    20      C       0

Upvotes: 2

Views: 954

Answers (1)

jezrael
jezrael

Reputation: 862661

Use DataFrameGroupBy.shift and remove -1 rows:

df['next_node'] = df.groupby('user')['node'].shift(-1, fill_value=-1)

df = df[df['next_node'].ne(-1)]
print (df)
    node user  next_node
0      1    A          2
1      2    A          3
2      3    A          4
3      4    A          5
4      5    A          6
5      6    A          7
6      7    A          8
8      9    B         10
9     10    B         11
10    11    B         12
12    13    C         14
13    14    C         15
14    15    C         16
15    16    C         17
16    17    C         18
17    18    C         19
18    19    C         20

Or remove last rows by user in Series.duplicated:

df['next_node'] = df.groupby('user')['node'].shift(-1, fill_value=-1)

df = df[df['user'].duplicated(keep='last')]

Upvotes: 2

Related Questions