Reputation: 137
I have a df:
id value
0 a_john_doe 123
1 b_robert_frost 456
I want to overwrite the 'id' column so that I chop off everything after the second '_' to get this:
id value
0 a_john 123
1 b_robert 456
I'm trying to do a split and then rejoin but it's giving an error:
TypeError: sequence item 0: expected str instance, list found
I can do the same thing on a hard coded string so I'm not too sure where I'm going wrong:
print('_'.join('a_john_doe'.split('_')[:2]))
# test gives back 'a_john'
df = pd.DataFrame({'id':['a_john_doe','b_robert_frost'], 'value':['123','456']})
df.id = '_'.join(df.id.str.split('_')[:2])
print(df)
Upvotes: 1
Views: 1683
Reputation: 150785
Let's do:
df['id'] = ['_'.join(x.split('_')[:2]) for x in df['id']]
Or in your style:
df['id'] = df['id'].str.split('_')[:2].agg('_'.join)
Output:
id value
0 a_john 123
1 b_robert 456
Upvotes: 2