Reputation: 147
I have data frame as shown below. I need to compare column in a data frame with the string and creating a new column.
DataFrame:
col_1
AB_SUMI
AK_SUMI
SB_LIMA
SB_SUMI
XY_SUMI
If 'AB','AK','SB' are present in col_1 it should create a new column with their respective values otherwise '*' should come in the column value.
expected output:
col_1 new_col
AB_SUMI AB
AK_SUMI AK
SB_LIMA SB
SB_SUMI SB
XY_SUMI *
I have tried with below code but not worked out.
list=['AB','AK','AB']
for item in list:
if df['col1'].str.contains(item).any():
df['new']=item
please help me in this regard. Thanks in advance
Upvotes: 1
Views: 340
Reputation: 294278
A fun approach
L = 'AB AK SB'.split()
c = df.col_1.values.astype(str)
f = lambda x, s : np.core.defchararray.find(x, s) >= 0
df.assign(new=np.stack([f(c, i) for i in L]).astype(object).T.dot(np.reshape(L, (-1, 1)))).replace('', '*')
col_1 new
0 AB_SUMI AB
1 AK_SUMI AK
2 SB_LIMA SB
3 SB_SUMI SB
4 XY_SUMI *
Upvotes: 0
Reputation: 862691
You can use extract
with regex
created with list
by join
|
(or
), last replace NaN
by fillna
:
L= ['AB','AK','SB']
a = '(' + '|'.join(L) + ')'
print (a)
(AB|AK|SB)
df['new'] = df.col_1.str.extract(a, expand=False).fillna('*')
print (df)
col_1 new
0 AB_SUMI AB
1 AK_SUMI AK
2 SB_LIMA SB
3 SB_SUMI SB
4 XY_SUMI *
Upvotes: 2