Pandas Shift Rows and Backfill (Time-Series Alignment)

Question

I have time-series customer data with running totals that look like this:

       week1 | week2 | week3 | week4 | week5
user1     20      40      40      50      50
user2      0      10      20      30      40 
user3      0       0       0      10      10

I am looking for spending trends, so I want to shift all my rows to start at week one and backfill with their last value, resulting in:

       week1 | week2 | week3 | week4 | week5
user1     20      40      40      50      50
user2     10      20      30      40      40
user3     10      10      10      10      10

Any help would be amazing!

Haleemur Ali · Accepted Answer

You can do this quite compactly as:

df.iloc[:, 1:] = df.iloc[:, 1:]. \
    apply(lambda row: row.shift(-np.argmax(row > 0)), axis=1). \
    ffill(axis=1)

but there is a lot going on in that 1 statement

iloc[:, 1:] selects all rows, and all but the first column (since we are not interested in touching the user column. My answer assumes that the user is a column, if the user is an index instead, then you can remove both the occurrences of [:, 1:] in this answer.

apply(, axis=1) applies the provided function to each row

np.argmax [as I used here] finds the first index in an array that meets a condition. in this case the first position with value > 0

row.shift(-np.argmax(row > 0)) shifts the row backwards dynamically, based on the position of the first greater-than-0-value.

ffill forward fills null values with the last non-null value.

Pandas Shift Rows and Backfill (Time-Series Alignment)

Answers (2)

Related Questions