How to split DataFrame column of strings to get everything after the nth occurrence of substring

Question

I have a df:

               id value
0      a_john_doe   123
1  b_robert_frost   456

I want to overwrite the 'id' column so that I chop off everything after the second '_' to get this:

               id value
0           a_john   123
1         b_robert   456

I'm trying to do a split and then rejoin but it's giving an error:

TypeError: sequence item 0: expected str instance, list found

I can do the same thing on a hard coded string so I'm not too sure where I'm going wrong:

print('_'.join('a_john_doe'.split('_')[:2]))
# test gives back 'a_john'

df = pd.DataFrame({'id':['a_john_doe','b_robert_frost'], 'value':['123','456']})
df.id = '_'.join(df.id.str.split('_')[:2])
print(df)

Quang Hoang · Accepted Answer

Let's do:

df['id'] = ['_'.join(x.split('_')[:2]) for x in df['id']]

Or in your style:

df['id'] = df['id'].str.split('_')[:2].agg('_'.join)

Output:

         id  value
0    a_john    123
1  b_robert    456

How to split DataFrame column of strings to get everything after the nth occurrence of substring

Answers (1)

Related Questions