Reputation: 1561
I am trying to see how can we extract all characters in a column after the 4th character.
col_a
XYZ123
ABCD001
Expecting the below
col_a, new_col
XYZ123, 23
ABCD001, D001
Upvotes: 2
Views: 855
Reputation: 133640
With your shown samples, could you please try following. Using str.extract
function of Pandas. Simple explanation would be, using regex ^.{4}(.*)$
by which getting everything apart from 1st 4 characters into capturing group and saving it to new column.
df['new_col'] = df['col_a'].str.extract(r'^.{4}(.*)$',expand=False)
Output of df will be as follows:
col_a new_col
0 XYZ123 23
1 ABCD001 001
Upvotes: 1
Reputation: 24324
Try with string slicing:
df['new_col']=df['col_a'].str[4:]
OR
Via re module:
import re
df['new_col']=df['col_a'].apply(lambda x:re.findall('[0-9]+', x)[0])
Upvotes: 6
Reputation: 26676
Another way;
Extract alphanumerics left of the first 3 alphanumerics
df['new_col']= df.col_a.str.extract('((?<=^\w{3})\w+)')
Upvotes: 0