Reputation: 61
I would like to extract pandas string with multiple conditions by applying if else in a loop. However, it seems not work and only return first column already. Any advices for that?
|col_a|col_b|
|peter--bob:5067561|peter--bob:5067561|
|chris**bbb:5067561|chris**bbb:5067561|
|bob##ccc:5067561|bob##ccc:5067561|
def get_string(df):
cols = df.columns[0:20]
for col in cols:
if col.find('*') == -1:
return df[col].astype(str).str.split('*').str[0]
if col.find('-') == -1:
return df[col].astype(str).str.split('-').str[0]
if col.find('#') == -1:
return df[col].astype(str).str.split('#').str[0]
Upvotes: 1
Views: 567
Reputation: 862761
In your loop is tested column name instead column values. Solution is select column by df[col]
and test by Series.str.contains
, apply solution and assign back to DataFrame
:
def get_string(df):
cols = df.columns[0:20]
for col in cols:
for v in ['*','-','#']:
mask = df[col].str.contains(v, na=False, regex=False)
df.loc[mask, col] = df.loc[mask, col].str.split(v).str[0]
return df
print (get_string(df))
col_a col_b
0 peter peter
1 chris chris
2 bob bob
Upvotes: 2