user1893148
user1893148

Reputation: 2190

Pandas data frame: adding columns based on previous time periods

I am trying to work through a problem in pandas, being more accustomed to R.

I have a data frame df with three columns: person, period, value

df.head() or the top few rows look like:

  | person | period | value
0 | P22    | 1      | 0
1 | P23    | 1      | 0
2 | P24    | 1      | 1
3 | P25    | 1      | 0
4 | P26    | 1      | 1
5 | P22    | 2      | 1

Notice the last row records a value for period 2 for person P22.

I would now like to add a new column that provides the value from the previous period. So if for P22 the value in period 1 is 0, then this new column would look like:

  | person | period | value  | lastperiod
5 | P22    | 2      | 1      | 0

I believe I need to do something like the following command, having loaded pandas:

for p in df.period.unique():
    df['lastperiod']== [???]

How should this be formulated?

Upvotes: 2

Views: 995

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375485

You could groupby person and then apply a shift to the values:

In [11]: g = df.groupby('person')

In [12]: g['value'].apply(lambda s: s.shift())
Out[12]: 
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6     0
dtype: float64

Adding this as a column:

In [13]: df['lastPeriod'] = g['value'].apply(lambda s: s.shift())

In [14]: df
Out[14]: 
  person  period  value  lastPeriod
1    P22       1      0         NaN
2    P23       1      0         NaN
3    P24       1      1         NaN
4    P25       1      0         NaN
5    P26       1      1         NaN
6    P22       2      1           0

Here the NaN signify missing data (i.e. there wasn't an entry in the previous period).

Upvotes: 3

Related Questions