Scott
Scott

Reputation: 77

Pandas Shift Rows and Backfill (Time-Series Alignment)

I have time-series customer data with running totals that look like this:

       week1 | week2 | week3 | week4 | week5
user1     20      40      40      50      50
user2      0      10      20      30      40 
user3      0       0       0      10      10

I am looking for spending trends, so I want to shift all my rows to start at week one and backfill with their last value, resulting in:

       week1 | week2 | week3 | week4 | week5
user1     20      40      40      50      50
user2     10      20      30      40      40
user3     10      10      10      10      10

Any help would be amazing!

Upvotes: 0

Views: 646

Answers (2)

wwnde
wwnde

Reputation: 26676

Nice answer from @Haleemur Ali.

Alternative would be to interpolate backwards.

Data

df=pd.DataFrame({'user':['user1','user2','user2'],'week1':[20,0,0],'week2':[40,10,0],'week3':[40,20,0],'week4':[50,40,10],'week5':[50,40,10]})
df.set_index('user', inplace=True)#.rename_axis(None)
df

Replace zero with NaN

df.replace(0, np.nan, inplace=True)
df

Then just interpolate

df.interpolate(method='linear', axis=1,limit=None,inplace=True,limit_direction='backward',limit_area=None,downcast=None,)

Upvotes: 1

Haleemur Ali
Haleemur Ali

Reputation: 28233

You can do this quite compactly as:

df.iloc[:, 1:] = df.iloc[:, 1:]. \
    apply(lambda row: row.shift(-np.argmax(row > 0)), axis=1). \
    ffill(axis=1)

but there is a lot going on in that 1 statement

iloc[:, 1:] selects all rows, and all but the first column (since we are not interested in touching the user column. My answer assumes that the user is a column, if the user is an index instead, then you can remove both the occurrences of [:, 1:] in this answer.

apply(<function>, axis=1) applies the provided function to each row

np.argmax [as I used here] finds the first index in an array that meets a condition. in this case the first position with value > 0

row.shift(-np.argmax(row > 0)) shifts the row backwards dynamically, based on the position of the first greater-than-0-value.

ffill forward fills null values with the last non-null value.

Upvotes: 2

Related Questions