n3a5p7s9t1e3r
n3a5p7s9t1e3r

Reputation: 286

apply function to varying number of columns using regex

p1 p2 p3 p4
0 1 4 3 40
1 2 5 70 8
2 3 6 6 500

I have a table that looks like this i want to apply regex = '^p\d' to search for specific column names and create new columns name n{1}, n{2}, ...

import pandas as pd
import numpy as np

data_1 = {'p1': [1, 2, 3],
          'p2': [4, 5, 6],
          'p3': [3, 70 ,6],
          'p4': [40, 8, 500],
         }



df1 = pd.DataFrame(data = data_1)
df1

Upvotes: 0

Views: 61

Answers (2)

Himanshu Poddar
Himanshu Poddar

Reputation: 7769

You can use python string function to achive the same, but since you asked for regex solution, here is how you can do using the same.

r = re.compile("^p\d")
qualified_columns = list(filter(r.match, df1.columns))
new_col_names = [re.sub(r'p(\d+)', r'n\1', col) for col in qualified_columns]
df1[new_col_names] = df1[qualified_columns].div(df1.pow(2).sum(axis=1).pow(1./2), axis=0)

Result :

   p1  p2  p3   p4        n1        n2        n3        n4
0   1   4   3   40  0.024799  0.099197  0.074398  0.991973
1   2   5  70    8  0.028304  0.070760  0.990643  0.113216
2   3   6   6  500  0.005999  0.011998  0.011998  0.999838

Upvotes: 1

Mortz
Mortz

Reputation: 4879

The index location of the columns does not matter if you are reading in the column names correctly.

pcols = [col for col in df1.columns if col.startswith('p')]
pnums = [col.split('p')[-1] for col in pcols]
pcols_rss = df1[pcols].apply(lambda row: np.sqrt(np.sum(row**2)), axis=1)
for col, num in zip(pcols, pnums):
    df1.loc[:, f'n{{{num}}}'] = df1[col] / pcols_rss

Output

   p1  p2  p3   p4      n{1}      n{2}      n{3}      n{4}
0   1   4   3   40  0.024799  0.099197  0.074398  0.991973
1   2   5  70    8  0.028304  0.070760  0.990643  0.113216
2   3   6   6  500  0.005999  0.011998  0.011998  0.999838

Upvotes: 1

Related Questions