Reputation: 13
How do I go about doing the follwing in a pandas dataframe? I have a time-series where I want a new column created that's based on having the same id value it looks for the previous epoch's value. See picture. I would like to do the following:
previous_epoch_stage
.id
:previous_epoch_stage
column with stage
value from epoch-1 row.previous_epoch_stage
value with stage
value from the same row.Upvotes: 0
Views: 1009
Reputation: 33950
Generally you don't need to create extra columns, if all you want is to access a lagged version of epoch
. You'd simply do df.groupby('id')
then reference ['epoch'].shift(1)
within each grouped-dataframe.
But if you really insist on doing this, solution using Boolean indexing, shift()
and fillna()
:
# Do the default lagged assignment for all rows where 'epoch' != 1
df['previous_epoch_stage'] = df.groupby('id')['epoch'].shift(1)
# Now fill NA's in-place from the 'stage' column
df['previous_epoch_stage'].fillna(df['stage'], inplace=True)
# and if you want to reverse fillna and the NaNs coercing your ints to floats:
df['previous_epoch_stage'] = df['previous_epoch_stage'].astype(int)
Notes:
previous_epoch_stage
column with stage
value from epoch-1 row" if we assume/require rows are sorted in increasing epoch
starting from 1, then we can just take df['stage'].head()
df.where(cond, other, ...)
](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html) that does vectorized if-else, and in this case other
would need to be a function ('callable'), but it doesn't play nicely with groupby, so use boolean indexing instead.fill_value=NaN
, or specify arbitrary periods
(+ve or -ve).Upvotes: 2
Reputation: 18950
Perhaps there's a more Pandas solution, but another solution could be:
df['prev_epoch_stage']=[df['stage'].iloc[i-1] if e>1 else df['stage'].iloc[i]
for i,e in enumerate(epoch)]
Upvotes: 0