Reputation: 1
I am struggling to loop a lambda function across multiple columns.
samp = pd.DataFrame({'ID':['1','2','3'], 'A':['1C22', '3X35', '2C77'],
'B': ['1C35', '2C88', '3X99'], 'C':['3X56', '2C73', '1X91']})
Essentially, I am trying to add three columns to this dataframe with a 1 if there is a 'C' in the string and a 0 if not (i.e. an 'X').
This function works fine when I apply it as a lambda function to each column individually, but I'm doing so to 40 differnt columns and the code is (I'm assuming) unnecessarily clunky:
def is_correct(str):
correct = len(re.findall('C', str))
return correct
samp.A_correct=samp.A.apply(lambda x: is_correct(x))
samp.B_correct=samp.B.apply(lambda x: is_correct(x))
samp.C_correct=samp.C.apply(lambda x: is_correct(x))
I'm confident there is a way to loop this, but I have been unsuccessful thus far.
Upvotes: 0
Views: 414
Reputation: 6564
You can iterate over the columns:
import pandas as pd
import re
df = pd.DataFrame({'ID':['1','2','3'], 'A':['1C22', '3X35', '2C77'],
'B': ['1C35', '2C88', '3X99'], 'C':['3X56', '2C73', '1X91']})
def is_correct(str):
correct = len(re.findall('C', str))
return correct
for col in df.columns:
df[col + '_correct'] = df[col].apply(lambda x: is_correct(x))
Upvotes: 1
Reputation: 150735
Let's try apply
and join
:
samp.join(samp[['A','B','C']].add_suffix('_correct')
.apply(lambda x: x.str.contains('C'))
.astype(int)
)
Output:
ID A B C A_correct B_correct C_correct
0 1 1C22 1C35 3X56 1 1 0
1 2 3X35 2C88 2C73 0 1 1
2 3 2C77 3X99 1X91 1 0 0
Upvotes: 0