Reputation: 50
Below is the sample data:
sample_col = ["col1","col2","col3","col4", "col5"]
sample_data = [[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20]]
sample = pd.DataFrame(data = data_data, columns = data_col)
sample
col1 col2 col3 col4 col5
0 1 2 3 4 5
1 6 7 8 9 10
2 11 12 13 14 15
3 16 17 18 19 20
m = [0,1,2,1]
I need to shift each row in the data frame sample by the corresponding value in the list m
.
The output is supposed to look like this
col1 col2 col3 col4 col5
0 1 2 3 4 5
1 NaN 6 7 8 9
2 NaN NaN 11 12 13
3 NaN 16 17 18 19
I know I can use sample.iloc[i].shift(m[i])
to achive what I want one by one or all by applying a loop. But I have a very large data set and it takes forever to complete.
So is there a better way to achieve what I am trying to do? Any help is welcome.
Upvotes: 1
Views: 511
Reputation: 36608
You can use boolean slicing to select just the rows of interest for each shift operation. It will still be slow if the set
of m
is large, but faster than what you have.
import pandas as pd
import numpy as np
sample_col = ["col1","col2","col3","col4", "col5"]
sample_data = [[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20]]
sample = pd.DataFrame(data = sample_data, columns = sample_col)
m = [0,1,2,1]
marr = np.array(m)
for i in sorted(set(m)):
if i==0:
continue
sample.loc[marr==i] = sample.loc[marr==i].shift(i, axis=1)
Upvotes: 1