Aneho
Aneho

Reputation: 79

How to apply a function to selected rows of a dataframe

I want to apply a regex function to selected rows in a dataframe. My solution works but the code is terribly long and I wonder if there is not a better, faster and more elegant way to solve this problem.

In words I want my regex function to be applied to elements of the source_value column, but only to rows where the column source_type == rhombus AND (rhombus_refer_to_odk_type == integer OR a decimal).

The code:

df_arrows.loc[(df_arrows['source_type']=='rhombus') & ((df_arrows['rhombus_refer_to_odk_type']=='integer') | (df_arrows['rhombus_refer_to_odk_type']=='decimal')),'source_value'] = df_arrows.loc[(df_arrows['source_type']=='rhombus') & ((df_arrows['rhombus_refer_to_odk_type']=='integer') | (df_arrows['rhombus_refer_to_odk_type']=='decimal')),'source_value'].apply(lambda x: re.sub(r'^[^<=>]+','', str(x)))

Upvotes: 1

Views: 64

Answers (1)

jezrael
jezrael

Reputation: 862641

Use Series.isin with condition in variable m and for replace use Series.str.replace:

m = (df_arrows['source_type']=='rhombus') & 
     df_arrows['rhombus_refer_to_odk_type'].isin(['integer','decimal'])
df_arrows.loc[m,'source_value'] = df_arrows.loc[m,'source_value'].astype(str).str.replace(r'^[^<=>]+','')

EDIT: If mask is 2 dimensional possible problem should be duplicated columns names, you can test it:

 print ((df_arrows['source_type']=='rhombus'))
 print (df_arrows['rhombus_refer_to_odk_type'].isin(['integer','decimal']))

Upvotes: 1

Related Questions