Reputation: 13
Apologise if I don't make sense in advance as I'm still working out 100% of the terminology, but am pretty good at Excel and am on my Python/Numpy journey.
Currently working with CSVs out of what I would compare to an IT Ticket system which has various columns which are fairly consistent to groupby, that part I'm ok with.
One column, in particular, is free-text to explain the issue but users may include a Error code; in this example, we will say its format is always in the format of "ERR#####"
Aka ERR54321
. "ERR"
being the constant and always followed by 5 numerals.
Is there a best method / way to somehow extract that particular value and then create it into its own column in the dataframe for that row?
Goal is to be able to do this so I can quantify the volume/frequency of the errors being provided.
Thanks in Advance!
Upvotes: 0
Views: 46
Reputation: 51683
You can use the power of regular expression on the dataframe:
import pandas as pd
# prepare demo df
data = ["got ERR12345 today", "ERR 0815", "to ERR or not to ERR", "no ERR11111 now"]
df = pd.DataFrame({"code" : data})
# use regex to extract stuff and create a new column
df["ERR"] = df["code"].str.extract(r"(ERR\d{5})")
print(df)
and create a new column by it:
code ERR
0 got ERR12345 today ERR12345
1 ERR 0815 NaN
2 to ERR or not to ERR NaN
3 no ERR11111 now ERR11111
Related links:
Upvotes: 1