Coby Viner
Coby Viner

Reputation: 184

Python pandas: use of DataFrame.replace function with a function as a value

Using Python pandas, I have been attempting to use a function, as one of a few replacement values for a pandas.DataFrame (i.e. one of the replacements should itself be the result of a function call). My understanding is that pandas.DataFrame.replace delegates internally to re.sub and that anything that works with it should also work with pandas.DataFrame.replace, provided that the regex parameter is set to True.

Accordingly, I followed the guidance provided elsewhere on stackoverflow, but pertaining to re.sub, and attempted to apply it to pandas.DataFrame.replace (using replace with regex=True, inplace=True and with to_replace set as either a nested dictionary, if specifying a specific column, or otherwise as two lists, per its documentation). My code works fine without using a function call, but fails if I try to provide a function as one of the replacement values, despite doing so in the same manner as re.sub (which was tested, and worked correctly). I realize that the function is expected to accept a match object as its only required parameter and return a string.

Instead of the resultant DataFrame having the result of the function call, it contains the function itself (i.e. as a first-class, unparameterized, object).

Why is this occurring and how can I get this to work correctly (return and store the function's result)? If this is not possible, I would appreciate if a viable and "Pandasonic" alternative could be suggested.


I provide an example of this below:

def fn(match):
    id = match.group(1)
    result = None
    with open(file_name, 'r') as file:
        for line in file:
        if 'string' in line:
            result = line.split()[-1]
    return (result or id)

data.replace(to_replace={'col1': {'string': fn}},
             regex=True, inplace=True)

The above does not work, in that it replaces the right search string, but replaces it with:

<function fn at 0x3ad4398>

For the above (contrived) example, the expected output would be that all values of "string" in col1 are substituted for the string returned from fn.

However, import re; print(re.sub('string', fn, 'test string')), works as expected (and as previously depicted).

Upvotes: 3

Views: 1927

Answers (1)

Coby Viner
Coby Viner

Reputation: 184

My current solution (which seems sub-optimal and ad hoc to me) is as follows (ellipses indicate irrelevant additional code, which has been omitted; specific data used are contrived):

def _fn(match):
    ...
    return ...


def _multiple_replace(text, repl_dictionary):
    """Adapted from: http://stackoverflow.com/a/15175239
       Returns the result for the first regex that matches
       the provided text."""
    for pattern in repl_dictionary.keys():
        regex = re.compile(pattern)
        res, num_subs = regex.subn(repl_dictionary[pattern], text)
        if num_subs > 0:
            break

    return res


repl_dict = {'ABC.*(\w\w\w)': _fn, 'XYZ': 'replacement_string'}
data['col1'] = data['col1'].apply(_multiple_replace,
                                  repl_dictionary=repl_dict)

Upvotes: 3

Related Questions