KrSid
KrSid

Reputation: 50

Shifting rows per index with a different value in a pandas dataframe

Below is the sample data:

sample_col = ["col1","col2","col3","col4", "col5"]
sample_data = [[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20]]
sample = pd.DataFrame(data = data_data, columns = data_col)

sample

   col1  col2  col3  col4 col5
 0  1    2     3     4    5
 1  6    7     8     9    10
 2  11   12    13    14   15
 3  16   17    18    19   20
 
m = [0,1,2,1]

I need to shift each row in the data frame sample by the corresponding value in the list m.

The output is supposed to look like this

   col1  col2  col3  col4 col5
 0  1    2     3     4    5
 1  NaN  6     7     8    9
 2  NaN  NaN   11    12   13
 3  NaN  16    17    18   19

I know I can use sample.iloc[i].shift(m[i]) to achive what I want one by one or all by applying a loop. But I have a very large data set and it takes forever to complete.

So is there a better way to achieve what I am trying to do? Any help is welcome.

Upvotes: 1

Views: 511

Answers (1)

James
James

Reputation: 36608

You can use boolean slicing to select just the rows of interest for each shift operation. It will still be slow if the set of m is large, but faster than what you have.

import pandas as pd
import numpy as np

sample_col = ["col1","col2","col3","col4", "col5"]
sample_data = [[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20]]
sample = pd.DataFrame(data = sample_data, columns = sample_col)
m = [0,1,2,1]

marr = np.array(m)
for i in sorted(set(m)):
    if i==0:
        continue
    sample.loc[marr==i] = sample.loc[marr==i].shift(i, axis=1)

Upvotes: 1

Related Questions