imxitiz
imxitiz

Reputation: 3987

Using f-string and raw string at same time

I have a dataframe like this:

           p          q
0      jdkdn  01JAN2020
1  01JAN2020   hdk789dj
2      783i3  01FEB2020

I am trying to replace 01JAN2020 by 01-01-2020, basically alphabetical month name to numeric month number. I don't know where these date are, mean in any column and any row. I have tried to do it by just using df.replace but having problem with raw string and I want to use f-string as well. Here's my try:

import pandas as pd
import re

df=pd.DataFrame({"p":["jdkdn","01JAN2020","783i3"],"q":["01JAN2020","hdk789dj","01FEB2020"]})

months = {'JAN': '01', 'FEB': '02'}
#All 12 months but for test I am using just 2

df.replace(r'(\d{2})('+'|'.join(months)+')(\d{4})',r"\1-"+

f"{{months[\\2]}}" # Here I am trying to do main things

+r"-\3",regex=True,inplace=True)
print(df)
"""
                       p                      q
0                  jdkdn  01-{months[JAN]}-2020
1  01-{months[JAN]}-2020               hdk789dj
2                  783i3  01-{months[FEB]}-2020
"""

fr"{{months[\\2]}}"
"""
                      p                     q
0                 jdkdn  01-{months[\2]}-2020
1  01-{months[\2]}-2020              hdk789dj
2                 783i3  01-{months[\2]}-2020
"""

rf"{months[\2]}" and rf"{months[\\2]}"
"""
SyntaxError: f-string expression part cannot include a backslash
"""

Idea:
Can we use regex if like (?(<month name>)<number>)?, Any function in pandas that will detect any 01JAN2020 replace that with 01-01-2020.

Expected output is:

           p           q
0      jdkdn  01-01-2020
1 01-01-2020    hdk789dj
2      783i3  01-02-2020

Upvotes: 1

Views: 239

Answers (1)

sammywemmy
sammywemmy

Reputation: 28709

You can use pandas str. replace, simply create a pattern with names, and the replacements, and use transform to apply it to all the columns:

pat = r"(?P<day>\d+)(?P<month>[A-Z]+)(?P<year>\d+)"
repl = lambda m: f"{m.group('day')}-{months[m.group('month')]}-{m.group('year')}"

df.transform(lambda x: x.str.replace(pat, repl, regex=True))
 
            p           q
0       jdkdn  01-01-2020
1  01-01-2020    hdk789dj
2       783i3  01-02-2020

Note that for strings, if you want more performance, you could write it within python and apply. You can use re.sub for that, it should give noticeable improvements (pls test):

df.applymap(lambda x: re.sub(pat, repl, x))

If your data is duplicated, an efficient approach would be to convert to a Categorical and apply the string modification to the categories; that should be faster.

Kindly note that nulls are covered implicitly by pandas string functions, within python, your code should cover these possible scenarios.

Upvotes: 3

Related Questions