Reputation: 77
I have time-series customer data with running totals that look like this:
week1 | week2 | week3 | week4 | week5
user1 20 40 40 50 50
user2 0 10 20 30 40
user3 0 0 0 10 10
I am looking for spending trends, so I want to shift all my rows to start at week one and backfill with their last value, resulting in:
week1 | week2 | week3 | week4 | week5
user1 20 40 40 50 50
user2 10 20 30 40 40
user3 10 10 10 10 10
Any help would be amazing!
Upvotes: 0
Views: 646
Reputation: 26676
Nice answer from @Haleemur Ali.
Alternative would be to interpolate backwards.
Data
df=pd.DataFrame({'user':['user1','user2','user2'],'week1':[20,0,0],'week2':[40,10,0],'week3':[40,20,0],'week4':[50,40,10],'week5':[50,40,10]})
df.set_index('user', inplace=True)#.rename_axis(None)
df
Replace zero with NaN
df.replace(0, np.nan, inplace=True)
df
Then just interpolate
df.interpolate(method='linear', axis=1,limit=None,inplace=True,limit_direction='backward',limit_area=None,downcast=None,)
Upvotes: 1
Reputation: 28233
You can do this quite compactly as:
df.iloc[:, 1:] = df.iloc[:, 1:]. \
apply(lambda row: row.shift(-np.argmax(row > 0)), axis=1). \
ffill(axis=1)
but there is a lot going on in that 1 statement
iloc[:, 1:]
selects all rows, and all but the first column (since we are not interested in touching the user column. My answer assumes that the user is a column, if the user is an index instead, then you can remove both the occurrences of [:, 1:]
in this answer.
apply(<function>, axis=1)
applies the provided function to each row
np.argmax
[as I used here] finds the first index in an array that meets a condition. in this case the first position with value > 0
row.shift(-np.argmax(row > 0))
shifts the row backwards dynamically, based on the position of the first greater-than-0-value.
ffill
forward fills null values with the last non-null value.
Upvotes: 2