Reputation: 4577
I have a DataFrame that looks like this:
Full Partial
ABCDEFGHIJKLMNOPQRSTUVWXYZ FGHIJKL
ANLHDFKNADHFBAKHFGBAKJFB FKNADH
JABFKADFNADKHFBADHBFJDHFBADF ABFKA
What I want to do is to put everything from Full
that does NOT match Partial
in lowercase, yielding the following:
Coverage
abcdef_GHIJKL_mnopqrstuvwxyz
anlhd_FKNADH_fbakhfgbakjfb
j_ABFKA_dfnadkhfbadhbfjdhfbadf
How would I do this? I looked around and it seems that series.str.extract()
could be a solution, but I'm not certain as when I try to do this:
df['Full'].str.extract(data['Partial'])
... it only says that Series can't be hashable. I assume that extract
only takes a single argument, rather than a Series? Is there any way to bypass this? Is extract
even the correct way to achieve what I'm looking for, or is there another way? I'm thinking I could perhaps find som way to extract the string indexes and do the following pseudocode:
df['Coverage'] = data['Full'][:start].lower() + '_' + data['Partial'] + \
'_' + data['Full'][End:].lower()
... where Start
and End
is the indexes for where data['Partial']
starts and ends, respectively. Thoughts?
Upvotes: 0
Views: 310
Reputation: 64443
Not the most elegant perhaps, but here is one solution:
For df:
Full Partial
0 ABCDEFGHIJKLMNOPQRSTUVWXYZ FGHIJKL
1 ANLHDFKNADHFBAKHFGBAKJFB FKNADH
2 JABFKADFNADKHFBADHBFJDHFBADF ABFKA
This:
df.apply(lambda r: r.Full.lower().replace(r.Partial.lower(), '_' + r.Partial + '_'), axis=1)
Returns:
0 abcde_FGHIJKL_mnopqrstuvwxyz
1 anlhd_FKNADH_fbakhfgbakjfb
2 j_ABFKA_dfnadkhfbadhbfjdhfbadf
For each row, you convert the full string to lowercase, and replace the 'partial string to lower' with the original partial string with two underscores added on both sides.
Upvotes: 2