Mitch
Mitch

Reputation: 596

How to insert columns in a pandas dataframe based on multiple conditions?

I have a dataframe that looks like:

df_test = pd.DataFrame({
    0: ['Property1', 1.2, 1.5, 2.6], 
    1: ["Std", 0.1,0.01,0.02],
    2: ["Rep", 3,3,3],
    3: ["Property2", 3.1,18.2,10.66],
    4: ["Property3", 22,33,44],
    5: ["Rep", 3,3,3],
    6: ["Property4", 6,12,14.23],
    7: ["Property5", 3.1,18.2,10.66],
    8: ["Std", 1,0.2,0.66],
})
         0     1    2          3          4    5          6          7     8
0  Property1   Std  Rep  Property2  Property3  Rep  Property4  Property5   Std
1        1.2   0.1    3        3.1         22    3          6        3.1     1
2        1.5  0.01    3       18.2         33    3         12       18.2   0.2
3        2.6  0.02    3      10.66         44    3      14.23      10.66  0.66
I want to insert columns if the value in df[1,i] is not ["Std", "Rep"] and df[1,i+1] is not "Std".

If I'm iterating over the columns (starting from the end) it would look like:

while i < j:
    if df.loc[1,i] != "Std" and df.loc[1,i] != "Rep":
        if df.loc[1,i+1] != "Std":
            df.insert(i+1,"Std",np.nan)
            df.columns = pd.RangeIndex(df.columns.size)
            df.loc[1,i+1]="Std"
    i+=1
    j = len(df.columns)-1

I'm trying to do with using select but I don't know how to set conditions that depend on two consecutive columns like I did in the while loop.

Expected result:

          0     1    2          3    4          5    6    7          8    9   \
0  Property1   Std  Rep  Property2  Std  Property3  Std  Rep  Property4  Std   
1        1.2   0.1    3        3.1  nan         22  nan    3          6  nan   
2        1.5  0.01    3       18.2  nan         33  nan    3         12  nan   
3        2.6  0.02    3      10.66  nan         44  nan    3      14.23  nan   

          10    11  
0  Property5   Std  
1        3.1     1  
2       18.2   0.2  
3      10.66  0.66  

Is there a way to vectorize this loop?

Upvotes: 0

Views: 42

Answers (1)

BENY
BENY

Reputation: 323316

Let us try

s = df.iloc[0,:]
s = s.str.startswith('Property')&s.shift(-1).ne('Std')
out = pd.concat([df,pd.DataFrame([['Std',np.nan,np.nan,np.nan]]*s.sum(),
                                 index = s.index[s],
                                 columns=df.index).T],axis=1).sort_index(level=0,axis=1)
out
           0     1    2          3    3  ...    5          6    6          7     8
0  Property1   Std  Rep  Property2  Std  ...  Rep  Property4  Std  Property5   Std
1        1.2   0.1    3        3.1  NaN  ...    3          6  NaN        3.1     1
2        1.5  0.01    3       18.2  NaN  ...    3         12  NaN       18.2   0.2
3        2.6  0.02    3      10.66  NaN  ...    3      14.23  NaN      10.66  0.66
[4 rows x 12 columns]

Upvotes: 1

Related Questions