Chris Chang
Chris Chang

Reputation: 61

extract string by multiple conditions in for loop python pandas

I would like to extract pandas string with multiple conditions by applying if else in a loop. However, it seems not work and only return first column already. Any advices for that?

|col_a|col_b|
|peter--bob:5067561|peter--bob:5067561|
|chris**bbb:5067561|chris**bbb:5067561|
|bob##ccc:5067561|bob##ccc:5067561|


def get_string(df):

    cols = df.columns[0:20]

    for col in cols:

        if col.find('*') == -1: 
            return df[col].astype(str).str.split('*').str[0]

        if col.find('-') == -1:
            return df[col].astype(str).str.split('-').str[0]

        if col.find('#') == -1:
            return df[col].astype(str).str.split('#').str[0]

Upvotes: 1

Views: 567

Answers (1)

jezrael
jezrael

Reputation: 862761

In your loop is tested column name instead column values. Solution is select column by df[col] and test by Series.str.contains, apply solution and assign back to DataFrame:

def get_string(df):

    cols = df.columns[0:20]

    for col in cols:
        for v in ['*','-','#']:
            mask = df[col].str.contains(v, na=False, regex=False)
            df.loc[mask, col] = df.loc[mask, col].str.split(v).str[0]

    return df

print (get_string(df))
   col_a  col_b
0  peter  peter
1  chris  chris
2    bob    bob

Upvotes: 2

Related Questions