Apavlo
Apavlo

Reputation: 13

Creating a new pandas column which takes values from a row, based on id

How do I go about doing the follwing in a pandas dataframe? I have a time-series where I want a new column created that's based on having the same id value it looks for the previous epoch's value. See picture. I would like to do the following:

  1. Create new column called previous_epoch_stage.
  2. For each id:

enter image description here

Upvotes: 0

Views: 1009

Answers (2)

smci
smci

Reputation: 33950

Generally you don't need to create extra columns, if all you want is to access a lagged version of epoch. You'd simply do df.groupby('id') then reference ['epoch'].shift(1) within each grouped-dataframe.

But if you really insist on doing this, solution using Boolean indexing, shift() and fillna() :

# Do the default lagged assignment for all rows where 'epoch' != 1
df['previous_epoch_stage'] = df.groupby('id')['epoch'].shift(1)
# Now fill NA's in-place from the 'stage' column
df['previous_epoch_stage'].fillna(df['stage'], inplace=True)
# and if you want to reverse fillna and the NaNs coercing your ints to floats:
df['previous_epoch_stage'] = df['previous_epoch_stage'].astype(int)

Notes:

  1. we can shortcut "fill previous_epoch_stage column with stage value from epoch-1 row" if we assume/require rows are sorted in increasing epoch starting from 1, then we can just take df['stage'].head()
  2. there's also a useful helper function df.where(cond, other, ...)](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html) that does vectorized if-else, and in this case other would need to be a function ('callable'), but it doesn't play nicely with groupby, so use boolean indexing instead.
  3. .shift() is neat because it allows you to customize fill_value=NaN, or specify arbitrary periods (+ve or -ve).

Upvotes: 2

fferri
fferri

Reputation: 18950

Perhaps there's a more Pandas solution, but another solution could be:

df['prev_epoch_stage']=[df['stage'].iloc[i-1] if e>1 else df['stage'].iloc[i]
                        for i,e in enumerate(epoch)]

Upvotes: 0

Related Questions