Kevin Nash
Kevin Nash

Reputation: 1561

Pandas - Extracting all text after the 4th character

I am trying to see how can we extract all characters in a column after the 4th character.

col_a
XYZ123
ABCD001

Expecting the below

col_a, new_col
XYZ123, 23
ABCD001, D001

Upvotes: 2

Views: 855

Answers (3)

RavinderSingh13
RavinderSingh13

Reputation: 133640

With your shown samples, could you please try following. Using str.extract function of Pandas. Simple explanation would be, using regex ^.{4}(.*)$ by which getting everything apart from 1st 4 characters into capturing group and saving it to new column.

df['new_col'] = df['col_a'].str.extract(r'^.{4}(.*)$',expand=False)

Output of df will be as follows:

     col_a new_col
0   XYZ123      23
1  ABCD001     001

Upvotes: 1

Anurag Dabas
Anurag Dabas

Reputation: 24324

Try with string slicing:

df['new_col']=df['col_a'].str[4:]

OR

Via re module:

import re
df['new_col']=df['col_a'].apply(lambda x:re.findall('[0-9]+', x)[0])

Upvotes: 6

wwnde
wwnde

Reputation: 26676

Another way;

Extract alphanumerics left of the first 3 alphanumerics

df['new_col']= df.col_a.str.extract('((?<=^\w{3})\w+)')

Upvotes: 0

Related Questions