Reputation: 184
Using Python pandas, I have been attempting to use a function, as one of a few replacement values for a pandas.DataFrame
(i.e. one of the replacements should itself be the result of a function call). My understanding is that pandas.DataFrame.replace
delegates internally to re.sub
and that anything that works with it should also work with pandas.DataFrame.replace
, provided that the regex
parameter is set to True
.
Accordingly, I followed the guidance provided elsewhere on stackoverflow, but pertaining to re.sub
, and attempted to apply it to pandas.DataFrame.replace
(using replace with regex=True, inplace=True
and with to_replace
set as either a nested dictionary, if specifying a specific column, or otherwise as two lists, per its documentation). My code works fine without using a function call, but fails if I try to provide a function as one of the replacement values, despite doing so in the same manner as re.sub
(which was tested, and worked correctly). I realize that the function is expected to accept a match object as its only required parameter and return a string.
Instead of the resultant DataFrame
having the result of the function call, it contains the function itself (i.e. as a first-class, unparameterized, object).
Why is this occurring and how can I get this to work correctly (return and store the function's result)? If this is not possible, I would appreciate if a viable and "Pandasonic" alternative could be suggested.
I provide an example of this below:
def fn(match):
id = match.group(1)
result = None
with open(file_name, 'r') as file:
for line in file:
if 'string' in line:
result = line.split()[-1]
return (result or id)
data.replace(to_replace={'col1': {'string': fn}},
regex=True, inplace=True)
The above does not work, in that it replaces the right search string, but replaces it with:
<function fn at 0x3ad4398>
For the above (contrived) example, the expected output would be that all values of "string" in col1
are substituted for the string returned from fn
.
However, import re; print(re.sub('string', fn, 'test string'))
, works as expected (and as previously depicted).
Upvotes: 3
Views: 1927
Reputation: 184
My current solution (which seems sub-optimal and ad hoc to me) is as follows (ellipses indicate irrelevant additional code, which has been omitted; specific data used are contrived):
def _fn(match):
...
return ...
def _multiple_replace(text, repl_dictionary):
"""Adapted from: http://stackoverflow.com/a/15175239
Returns the result for the first regex that matches
the provided text."""
for pattern in repl_dictionary.keys():
regex = re.compile(pattern)
res, num_subs = regex.subn(repl_dictionary[pattern], text)
if num_subs > 0:
break
return res
repl_dict = {'ABC.*(\w\w\w)': _fn, 'XYZ': 'replacement_string'}
data['col1'] = data['col1'].apply(_multiple_replace,
repl_dictionary=repl_dict)
Upvotes: 3