Using f-string and raw string at same time

Question

I have a dataframe like this:

           p          q
0      jdkdn  01JAN2020
1  01JAN2020   hdk789dj
2      783i3  01FEB2020

I am trying to replace 01JAN2020 by 01-01-2020, basically alphabetical month name to numeric month number. I don't know where these date are, mean in any column and any row. I have tried to do it by just using df.replace but having problem with raw string and I want to use f-string as well. Here's my try:

import pandas as pd
import re

df=pd.DataFrame({"p":["jdkdn","01JAN2020","783i3"],"q":["01JAN2020","hdk789dj","01FEB2020"]})

months = {'JAN': '01', 'FEB': '02'}
#All 12 months but for test I am using just 2

df.replace(r'(\d{2})('+'|'.join(months)+')(\d{4})',r"\1-"+

f"{{months[\2]}}" # Here I am trying to do main things

+r"-\3",regex=True,inplace=True)
print(df)
"""
                       p                      q
0                  jdkdn  01-{months[JAN]}-2020
1  01-{months[JAN]}-2020               hdk789dj
2                  783i3  01-{months[FEB]}-2020
"""

fr"{{months[\2]}}"
"""
                      p                     q
0                 jdkdn  01-{months[\2]}-2020
1  01-{months[\2]}-2020              hdk789dj
2                 783i3  01-{months[\2]}-2020
"""

rf"{months[\2]}" and rf"{months[\2]}"
"""
SyntaxError: f-string expression part cannot include a backslash
"""

Idea:
Can we use regex if like (?())?, Any function in pandas that will detect any 01JAN2020 replace that with 01-01-2020.

Expected output is:

           p           q
0      jdkdn  01-01-2020
1 01-01-2020    hdk789dj
2      783i3  01-02-2020

sammywemmy · Accepted Answer

You can use pandas str. replace, simply create a pattern with names, and the replacements, and use transform to apply it to all the columns:

pat = r"(?P\d+)(?P[A-Z]+)(?P\d+)"
repl = lambda m: f"{m.group('day')}-{months[m.group('month')]}-{m.group('year')}"

df.transform(lambda x: x.str.replace(pat, repl, regex=True))
 
            p           q
0       jdkdn  01-01-2020
1  01-01-2020    hdk789dj
2       783i3  01-02-2020

Note that for strings, if you want more performance, you could write it within python and apply. You can use re.sub for that, it should give noticeable improvements (pls test):

df.applymap(lambda x: re.sub(pat, repl, x))

If your data is duplicated, an efficient approach would be to convert to a Categorical and apply the string modification to the categories; that should be faster.

Kindly note that nulls are covered implicitly by pandas string functions, within python, your code should cover these possible scenarios.

Using f-string and raw string at same time

Answers (1)

Related Questions