Reputation: 1689
To try, I have:
test = pd.DataFrame([[1,'A', 'B', 'A B r'], [0,'A', 'B', 'A A A'], [2,'B', 'C', 'B a c'], [1,'A', 'B', 's A B'], [1,'A', 'B', 'A'], [0,'B', 'C', 'x']])
replace = [['x', 'y', 'z'], ['r', 's', 't'], ['a', 'b', 'c']]
I would like to replace parts of values in the last column with 0 only if they exist in the replace
list at position corresponding to the number in the first column for that row.
For example, looking at the first three rows:
So, since 'r' is in replace[1]
, that cell becomes A B 0
.
'A' is not in replace[0]
, so it stays as A A A
,
'a' and 'c' are both in replace[2]
, so it becomes B 0 0
,
etc.
I tried something like
test[3] = test[3].apply(lambda x: ' '.join([n if n not in replace[test[0]] else 0 for n in test.split()]))
but it's not changing anything.
Upvotes: 2
Views: 100
Reputation: 323226
Finally I know what you need
s=pd.Series(replace).reindex(test[0])
[ "".join([dict.fromkeys(y,'0').get(c, c) for c in x]) for x,y in zip(test[3],s)]
['A B 0', 'A A A', 'B 0 0', '0 A B', 'A', '0']
Upvotes: 2
Reputation: 862481
Use list comprehension with lookup in sets:
test[3] = [' '.join('0' if i in set(replace[a]) else i for i in b.split())
for a,b in zip(test[0], test[3])]
print (test)
0 1 2 3
0 1 A B A B 0
1 0 A B A A A
2 2 B C B 0 0
3 1 A B 0 A B
4 1 A B A
5 0 B C 0
Or convert to sets before for improve performance:
r = [set(x) for x in replace]
test[3]=[' '.join('0' if i in r[a] else i for i in b.split()) for a,b in zip(test[0], test[3])]
Upvotes: 2
Reputation: 59274
IIUC, use zip
and a list comprehension to accomplish this.
I've simplified and created a custom replace_
function, but feel free to use regex
to perform the replacement if needed.
def replace_(st, reps):
for old,new in reps:
st = st.replace(old,new)
return st
df['new'] = [replace_(b, zip(replace[a], ['0']*3)) for a,b in zip(df[0], df[3])]
Outputs
0 1 2 3 new
0 1 A B A B r A B 0
1 0 A B A A A A A A
2 2 B C B a c B 0 0
3 1 A B s A B 0 A B
4 1 A B A A
5 0 B C x 0
Upvotes: 3