Reputation: 1255
I have a pandas dataframe as below
import pandas as pd
df = pd.DataFrame({'col':['abcfg_grp_202005', 'abcmn_abc_202009', 'abcgd_xyz_8976', 'abcgd_lmn_1']})
df
col
0 abcfg_grp_202005
1 abcmn_abc_202009
2 abcgd_xyz_8976
3 abcgd_lmn_1
I want to replace 'col' as fist instance before _ in "col". IF there is a single digit in the 3rd instance after _ then append that to end of "col" as below
col
0 abcfg
1 abcmn
2 abcgd
3 abcgd_1
Upvotes: 6
Views: 1340
Reputation: 99
I wrote function. Then i used .apply() built-in method to apply my function to each value.
def editcols(col_value):
splitted_col_value = col_value.split('_')
if len(splitted_col_value[2])==1:
return f'{splitted_col_value[0]}_{splitted_col_value[2]}'
else:
return splitted_col_value[0]
df['col'] = df['col'].apply(editcols)
I hope it is clear. Please let me know if it worked
Upvotes: 0
Reputation: 7604
Here's another way to do it:
df['col'] = np.where(df['col'].str.contains(r'[a-zA-Z0-9]+_[a-zA-Z0-9]+_[0-9]\b', regex=True),
df['col'].str.split('_').str[0] + '_' + df['col'].str.split('_').str[2],
df['col'].str.split('_').str[0])
print(df)
col
0 abcfg
1 abcmn
2 abcgd
3 abcgd_1
Upvotes: 1
Reputation: 69
Try this
for i in range(len(df)):
x = df.loc[i,"col"].split('_')
if(len(x[2])==1):
df.loc[i,"col"] = x[0]+"_"+x[2]
else:
df.loc[i,"col"] = x[0]
Split the data then check if the 2nd index value's lenght. If it is 1, make the data in column splited[0] + splited[2], if not it is just splited[0]
Upvotes: 0
Reputation: 522254
Perhaps not the most elegant answer, but I would recommend using str.replace
twice here:
df["col"]= df["Team"]
.str.replace("^([^_]+)_.*?(?!_\d$).{2}$", "\\1")
.str.replace("_[^_]+(?=_)", "")
The first regex targets inputs of the form abcfg_grp_202005
which do not end in underscore followed by a digit. In this case, we would be left with abcfg
. The second regex removes the middle underscore term, should it still exist, which would only be true for inputs like abcgd_lmn_1
ending in underscore followed by a digit.
Upvotes: 0
Reputation: 6758
You can apply a custom function.
import pandas as pd
df = pd.DataFrame({'col':['abcfg_grp_202005', 'abcmn_abc_202009', 'abcgd_xyz_8976', 'abcgd_lmn_1']})
def func(x):
ar = x.split('_')
if len(ar[2]) == 1 and ar[2].isdigit():
return ar[0]+"_"+ar[2]
else:
return ar[0]
df['col'] = df['col'].apply(lambda x: func(x))
df
col
0 abcfg
1 abcmn
2 abcgd
3 abcgd_1
Upvotes: 2
Reputation: 59579
Split on the underscores, then add the strings. Here we can use the trick that False
multiplied by a string returns the empty string to deal with the conditional addition. The check is a 1 character string that is a digit.
df1 = df['col'].str.split('_', expand=True)
df['col'] = df1[0] + ('_' + df1[2])*(df1[2].str.len().eq(1) & df1[2].str.isdigit())
print(df)
col
0 abcfg
1 abcmn
2 abcgd
3 abcgd_1
Upvotes: 2
Reputation: 34086
You can use df.apply
:
In [1441]: df['col'] = df.col.str.split('_', expand=True).apply(lambda x: (x[0] + '_' + x[2]) if len(x[2]) == 1 else x[0], axis=1)
In [1442]: df
Out[1442]:
col
0 abcfg
1 abcmn
2 abcgd
3 abcgd_1
Upvotes: 3