Reputation: 77
I have a dataframe in the below format and and trying to use the extract function but I keep getting the following error:
ValueError: If using all scalar values, you must pass an index
column1 column2
1 abc2150/abc2152/abc2154/abc215601/U215602
df.column2.str
.split('/',expand=True)
.apply(lambda row: row.str.extract('(\d+)', expand=True))
.apply(lambda x: '/'.join(x.dropna().astype(str)), axis=1)
I need the output in the below format.
column1 column2
1 2150/2152/2154/215601/215602
Please let me know how to fix it.
Thanks
Upvotes: 2
Views: 245
Reputation: 3926
Here is what I will do:
df.loc[:, "column2"] = df.column2.apply(lambda x: re.sub("[a-zA-Z]+", "", x))
Upvotes: -1
Reputation: 88226
You could instead use str.replace
with a positive lookahead to remove all characters that precede the numerical part:
df.column2.str.replace(r'[a-zA-Z]+(?=\d+)','')
0 2150/2152/2154/215601/215602
Name: column2, dtype: object
Upvotes: 2