Shanoo
Shanoo

Reputation: 1255

Replace a pandas column by splitting the text based on "_"

I have a pandas dataframe as below

import pandas as pd
df = pd.DataFrame({'col':['abcfg_grp_202005', 'abcmn_abc_202009', 'abcgd_xyz_8976', 'abcgd_lmn_1']})
df

    col
0   abcfg_grp_202005
1   abcmn_abc_202009
2   abcgd_xyz_8976
3   abcgd_lmn_1

I want to replace 'col' as fist instance before _ in "col". IF there is a single digit in the 3rd instance after _ then append that to end of "col" as below

    col
0   abcfg
1   abcmn
2   abcgd
3   abcgd_1

Upvotes: 6

Views: 1340

Answers (7)

Shamsiddin Parpiev
Shamsiddin Parpiev

Reputation: 99

I wrote function. Then i used .apply() built-in method to apply my function to each value.

def editcols(col_value):
    splitted_col_value = col_value.split('_')
    if len(splitted_col_value[2])==1:
        return f'{splitted_col_value[0]}_{splitted_col_value[2]}'
    else:
        return splitted_col_value[0]

df['col'] = df['col'].apply(editcols)

I hope it is clear. Please let me know if it worked

Upvotes: 0

NYC Coder
NYC Coder

Reputation: 7604

Here's another way to do it:

df['col'] = np.where(df['col'].str.contains(r'[a-zA-Z0-9]+_[a-zA-Z0-9]+_[0-9]\b', regex=True),
                     df['col'].str.split('_').str[0] + '_' +  df['col'].str.split('_').str[2],
                     df['col'].str.split('_').str[0])
print(df)


       col
0    abcfg
1    abcmn
2    abcgd
3  abcgd_1

Upvotes: 1

Bora Çolakoğlu
Bora Çolakoğlu

Reputation: 69

Try this

for i in range(len(df)):
    x = df.loc[i,"col"].split('_')
    if(len(x[2])==1):
        df.loc[i,"col"] = x[0]+"_"+x[2]
    else:
        df.loc[i,"col"] = x[0]

Split the data then check if the 2nd index value's lenght. If it is 1, make the data in column splited[0] + splited[2], if not it is just splited[0]

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522254

Perhaps not the most elegant answer, but I would recommend using str.replace twice here:

df["col"]= df["Team"]
    .str.replace("^([^_]+)_.*?(?!_\d$).{2}$", "\\1")
    .str.replace("_[^_]+(?=_)", "")

The first regex targets inputs of the form abcfg_grp_202005 which do not end in underscore followed by a digit. In this case, we would be left with abcfg. The second regex removes the middle underscore term, should it still exist, which would only be true for inputs like abcgd_lmn_1 ending in underscore followed by a digit.

Upvotes: 0

Equinox
Equinox

Reputation: 6758

You can apply a custom function.

import pandas as pd
df = pd.DataFrame({'col':['abcfg_grp_202005', 'abcmn_abc_202009', 'abcgd_xyz_8976', 'abcgd_lmn_1']})
def func(x):
    ar = x.split('_')
    if len(ar[2]) == 1 and ar[2].isdigit():
        return ar[0]+"_"+ar[2]
    else:
        return ar[0]
    
df['col'] = df['col'].apply(lambda x: func(x))
df

    col
0   abcfg
1   abcmn
2   abcgd
3   abcgd_1

Upvotes: 2

ALollz
ALollz

Reputation: 59579

Split on the underscores, then add the strings. Here we can use the trick that False multiplied by a string returns the empty string to deal with the conditional addition. The check is a 1 character string that is a digit.

df1 = df['col'].str.split('_', expand=True)
df['col'] = df1[0] + ('_' + df1[2])*(df1[2].str.len().eq(1) & df1[2].str.isdigit())

print(df)

       col
0    abcfg
1    abcmn
2    abcgd
3  abcgd_1

Upvotes: 2

Mayank Porwal
Mayank Porwal

Reputation: 34086

You can use df.apply:

In [1441]: df['col'] = df.col.str.split('_', expand=True).apply(lambda x: (x[0] + '_' + x[2]) if len(x[2]) == 1 else x[0], axis=1)

In [1442]: df
Out[1442]: 
       col
0    abcfg
1    abcmn
2    abcgd
3  abcgd_1

Upvotes: 3

Related Questions