Reputation: 2219

Pandas: Iterate over existing columns and create new columns based on conditionals

The best version of a question that relates to my question is found here. But I'm running into a hiccup somewhere.

My dataframe:

df = pd.DataFrame({'KEY': ['100000003', '100000009', '100000009', '100000009'], 
              'RO_1': [1, 1, 4,1],
              'RO_2': [1, 0, 0,0],
              'RO_3': [1, 1, 1,1],
              'RO_4': [1, 4, 1,1]})

    KEY         RO_1  RO_2   RO_3 RO_4 
0   100000003   1      1     1    1   
1   100000009   1      0     1    4    
2   100000009   4      0     1    1    
3   100000009   1      0     1    1

I want to create 3 addition columns labeled 'Month1', 'Month2', to 'Month4'. Something simple like:

for i in range(3):
    df.loc[1,'Month'+str(i)] = 1 # '1' is just there as a place holder

Although I'm getting a warning message when I execute this code:

"A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"

I want to combine this with conditionals to fill in each cell for each column and each row.

The code below will create one one column and flag based on the condition if any column with RO_ has either condition

namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for col in ROList:
    for i in range(3):
        df['Month'] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0') 
df

I treid combining the two codes but I am missing a fundamental understanding of how to do this. Any help would be great.

Final expected result:

    KEY         RO_1  RO_2   RO_3 RO_4 Month1 Month2 Month3 Month4
0   100000003   1      1     1    1    1      1      1      1
1   100000009   1      0     1    4    1      0      1      1
2   100000009   4      0     1    1    1      0      1      1  
3   100000009   1      0     1    1    1      0      1      1

Upvotes: 0

Answers (3)

BENY

Reputation: 323396

IIUC enumerate

namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for i,col in enumerate(ROList):

    df['Month'+str(i+1)] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0')
df
Out[194]: 
         KEY  RO_1  RO_2  RO_3  RO_4 Month1 Month2 Month3 Month4
0  100000003     1     1     1     1      1      1      1      1
1  100000009     1     0     1     4      1      0      1      1
2  100000009     4     0     1     1      1      0      1      1
3  100000009     1     0     1     1      1      0      1      1

Your logic seems like change 4 to 1

df.assign(**df.loc[:,ROList].mask(df.loc[:,ROList]==4,1).rename(columns=dict(zip(ROList,list(range(1,len(ROList)+1))))).add_prefix('Month'))
Out[15]: 
         KEY  RO_1  RO_2  RO_3  RO_4  Month1  Month2  Month3  Month4
0  100000003     1     1     1     1       1       1       1       1
1  100000009     1     0     1     4       1       0       1       1
2  100000009     4     0     1     1       1       0       1       1
3  100000009     1     0     1     1       1       0       1       1

Upvotes: 2

cs95

Reputation: 403218

Use filter + isin + rename, for a single pipelined transformation of your data.

v = (df.filter(regex='^RO_')    # select columns
      .isin([4, 1])             # check if the value is 4 or 1
      .astype(int)              # convert the `bool` result to `int`
      .rename(                  # rename columns
          columns=lambda x: x.replace('RO_', 'Month')
      ))

Or, for the sake of performance,

v = df.filter(regex='^RO_')\
          .isin([4, 1])\
          .astype(int) 
v.columns = v.columns.str.replace('RO_', 'Month')

Finally, concatenate the result with the original.

pd.concat([df, v], axis=1)

         KEY  RO_1  RO_2  RO_3  RO_4  Month1  Month2  Month3  Month4
0  100000003     1     1     1     1       1       1       1       1
1  100000009     1     0     1     4       1       0       1       1
2  100000009     4     0     1     1       1       0       1       1
3  100000009     1     0     1     1       1       0       1       1

Upvotes: 2

Yilun Zhang

Reputation: 9018

Seems like you are creating a new column for each existing column in your dataframe. You can do something like:

original_cols = df.columns
for c in original_cols:
    cname = "Month" + c.split("_")[-1]
    df[cname] = df[c].apply(lambda x: 1 if (x == 1) or (x == 4) else 0)

Upvotes: 0

Pandas: Iterate over existing columns and create new columns based on conditionals

Answers (3)

Related Questions