Python/Pandas newbie - Extract value within column with semi-consistant value to another column

Question

Apologise if I don't make sense in advance as I'm still working out 100% of the terminology, but am pretty good at Excel and am on my Python/Numpy journey.

Currently working with CSVs out of what I would compare to an IT Ticket system which has various columns which are fairly consistent to groupby, that part I'm ok with.

One column, in particular, is free-text to explain the issue but users may include a Error code; in this example, we will say its format is always in the format of "ERR#####" Aka ERR54321. "ERR" being the constant and always followed by 5 numerals.

Is there a best method / way to somehow extract that particular value and then create it into its own column in the dataframe for that row?

Goal is to be able to do this so I can quantify the volume/frequency of the errors being provided.

Thanks in Advance!

Patrick Artner · Accepted Answer

You can use the power of regular expression on the dataframe:

import pandas as pd

# prepare demo df
data = ["got ERR12345 today", "ERR 0815", "to ERR or not to ERR", "no ERR11111 now"]
df = pd.DataFrame({"code" : data}) 

# use regex to extract stuff and create a new column
df["ERR"] = df["code"].str.extract(r"(ERR\d{5})")

print(df)

and create a new column by it:

                   code       ERR
0    got ERR12345 today  ERR12345
1              ERR 0815       NaN
2  to ERR or not to ERR       NaN
3       no ERR11111 now  ERR11111

Python/Pandas newbie - Extract value within column with semi-consistant value to another column

Answers (1)

Related Questions