Nanda
Nanda

Reputation: 403

Extracting portions of the entries of Pandas dataframe

I have a Pandas dataframe with several columns wherein the entries of each column are a combination of​ numbers, upper and lower case letters and some special characters:, i.e, "=A-Za-z0-9_|"​. Each entry of the column is of the form:

'x=ABCDefgh_5|123|' ​

I want to retain only the numbers 0-9 appearing only between | | and strip out all other characters​. Here is my code for one column of the dataframe:

list(map(lambda x: x.lstrip(r'\[=A-Za-z_|,]+'), df[1]))

However, the code returns the full entry ​'x=ABCDefgh_5|123|' ​ without stripping out anything. Is there an error in my code?

Upvotes: 0

Views: 34

Answers (1)

braml1
braml1

Reputation: 584

Instead of working with these unreadable regex expressions, you might want to consider a simple split. For example:

import pandas as pd

d = {'col': ["x=ABCDefgh_5|123|", "x=ABCDefgh_5|123|"]}
df = pd.DataFrame(data=d)

output = df["col"].str.split("|").str[1]

Upvotes: 1

Related Questions