apply function to varying number of columns using regex

Question

	p1	p2	p3	p4
0	1	4	3	40
1	2	5	70	8
2	3	6	6	500

I have a table that looks like this i want to apply regex = '^p\d' to search for specific column names and create new columns name n{1}, n{2}, ...

n{1} = p{1}/ SQUARE ROOT OF (p{1}**2 + p{2}**2 + p{3}**2 + ...)
n{2} = p{2}/ SQUARE ROOT OF (p{1}**2 + p{2}**2 + p{3}**2 + ...)
and so on upto... n_{n}
index location of p1, p2, p3 varies for different csv files. So I am trying search using regex to search column names.
number of p rows also varies for different csv files. I am trying to stop creating new columns loop if the next p column does not exist. Not sure how to do it.

import pandas as pd
import numpy as np

data_1 = {'p1': [1, 2, 3],
          'p2': [4, 5, 6],
          'p3': [3, 70 ,6],
          'p4': [40, 8, 500],
         }



df1 = pd.DataFrame(data = data_1)
df1

Himanshu Poddar · Accepted Answer

You can use python string function to achive the same, but since you asked for regex solution, here is how you can do using the same.

r = re.compile("^p\d")
qualified_columns = list(filter(r.match, df1.columns))
new_col_names = [re.sub(r'p(\d+)', r'n\1', col) for col in qualified_columns]
df1[new_col_names] = df1[qualified_columns].div(df1.pow(2).sum(axis=1).pow(1./2), axis=0)

Result :

   p1  p2  p3   p4        n1        n2        n3        n4
0   1   4   3   40  0.024799  0.099197  0.074398  0.991973
1   2   5  70    8  0.028304  0.070760  0.990643  0.113216
2   3   6   6  500  0.005999  0.011998  0.011998  0.999838

apply function to varying number of columns using regex

Answers (2)

Related Questions