Python data frame column string extraction efficient way?

Question

I have a data frame df with column ID in the following pattern. What I want is to return a string column with the number after the dash sign. For the example below, I need 01,01,02. I used the command below and it failed. Since it is a very large data frame, I think it might be inefficient to do a loop and row by row extraction. Please advise, thanks

df['ID'].apply(lambda x: x.split('-')[1], axis=1)

error: () got an unexpected keyword argument 'axis'

DP00010-01
DP00020-01
..........
DP00010-02

Update: Edchum's solution

df['ID'].str.split('-').str[1]

works for me

EdChum · Accepted Answer

Use vectorised str method split if you have a recent version of pandas:

In [26]:
df['val'].str.split('-').str[1]
Out[26]:
0    01
1    01
2    02
dtype: object

If the dash position was fixed then you could slice it

In [28]:    
df['val'].str[8:]
Out[28]:
0    01
1    01
2    02
Name: val, dtype: object

As to why your method failed, you were calling apply on a Series (df['ID'] is a Series and not a df) and there is no axis param so the following works:

In [29]:
df['val'].apply(lambda x: x.split('-')[1])

Out[29]:
0    01
1    01
2    02
Name: val, dtype: object

Python data frame column string extraction efficient way?

Answers (1)

Related Questions