muxevola
muxevola

Reputation: 199

Pandas str.replace method regex flag raises inconsistent exceptions

When I use the regex=[True|False] flag in the pd.Series.str.replace() method, I get contradictory exceptions:

I'm trying to replace the month part of a Spanish date from a DataFrame index with the corresponding English short name.

import pandas as pd
import numpy as np

# Define the months' short names in English and Spanish
ENG = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']
ESP = ['ENE', 'FEB', 'MAR', 'ABR', 'MAY', 'JUN', 'JUL', 'AGO', 'SEP', 'OCT', 'NOV', 'DIC']

# Dictionary mapping Spanish months to English months
esp2eng = dict(zip(ESP, ENG))

# Function to make the dictionary "callable"
def eng_from_esp(key):
    return esp2eng[key]

# Create the DF with date in the "%d-%b-%y" format as index, where %b is the Spanish naming
idx = ['06-{}-19'.format(m) for m in ESP]
col = ['ordinal']
data = pd.DataFrame(np.arange(12).reshape((12, 1)),
                   index=idx,
                   columns=col)

data.index.str.replace('ENE', esp2eng, regex=False)
TypeError: repl must be a string or callable

data.index.str.replace('ENE', eng_from_esp, regex=False)
ValueError: Cannot use a callable replacement when regex=False

Upvotes: 4

Views: 17835

Answers (1)

pault
pault

Reputation: 43524

If you look at the documentation for pandas.Series.str.replace you will see that the repl argument can be a string or callable, but a dict is not supported.

With that in mind, your first attempt is not supported.

Digging into the source code (key parts reproduced below), you still see that the check for string or callable is done first, before checking the regex flag.

# Check whether repl is valid (GH 13438, GH 15055)
if not (is_string_like(repl) or callable(repl)):
    raise TypeError("repl must be a string or callable")

if regex:
    # omitted
else:
    # omitted
    if callable(repl):
        raise ValueError("Cannot use a callable replacement when "
                         "regex=False")

So your first attempt (using a dictionary for repl) trips the first if check prints the message that "repl must be a string or callable".

Your second attempt passes this check, but then gets tripped by the check for a callable inside the else block of the regex check.

So in short, there is no inconsistency. Sure the first error message could potentially be improved to say something like "repl must be a string or callable (unless you're using regex=False)" but that's not really necessary.


FWIW, here is a pandas "one-liner" that should achieve the desired result:

print(
    data.reset_index()
        .replace(esp2eng, regex=True)
        .set_index("index", drop=True)
        .rename_axis(None, axis=0)
)
#           ordinal
#06-JAN-19        0
#06-FEB-19        1
#06-MAR-19        2
#06-APR-19        3
#06-MAY-19        4
#06-JUN-19        5
#06-JUL-19        6
#06-AUG-19        7
#06-SEP-19        8
#06-OCT-19        9
#06-NOV-19       10
#06-DEC-19       11

Upvotes: 3

Related Questions