Reputation: 403
I have a Pandas dataframe with several columns wherein the entries of each column are a combination of numbers, upper and lower case letters and some special characters:, i.e, "=A-Za-z0-9_|"
. Each entry of the column is of the form:
'x=ABCDefgh_5|123|'
I want to retain only the numbers 0-9
appearing only between | |
and strip out all other characters. Here is my code for one column of the dataframe:
list(map(lambda x: x.lstrip(r'\[=A-Za-z_|,]+'), df[1]))
However, the code returns the full entry 'x=ABCDefgh_5|123|'
without stripping out anything. Is there an error in my code?
Upvotes: 0
Views: 34
Reputation: 584
Instead of working with these unreadable regex expressions, you might want to consider a simple split. For example:
import pandas as pd
d = {'col': ["x=ABCDefgh_5|123|", "x=ABCDefgh_5|123|"]}
df = pd.DataFrame(data=d)
output = df["col"].str.split("|").str[1]
Upvotes: 1