Reputation: 4408
I have a dataframe, df, where I would like to create a new column that contains the values after the first underscore, and dropping these values and keeping the middle, if there are two underscores (essentially maintaining the middle value)
data
col1 col2
i_ba 1
i_sapp 1
h_sapp_ii 1
h_sapp_led 1
v_bu 2
v_bu 2
desired
col1 col2 type
i_ba 1 ba
i_sapp 1 sapp
h_sapp_ii 1 sapp
h_sapp_led 1 sapp
v_bu 2 bu
v_bu 2 bu
doing
df['col1'] = df['col1'].str.extract(r'\w*?_([^_]*)(?:_)?')
I am just not sure how to create a new column with these values. Any suggestion is appreciated
Upvotes: 1
Views: 971
Reputation: 8800
You can use the pandas string split
, as well as pandas string indexing:
>>> df['type'] = df['col1'].str.split('_').str[1]
>>> df
col1 col2 type
0 i_ba 1 ba
1 i_sapp 1 sapp
2 h_sapp_ii 1 sapp
3 h_sapp_led 1 sapp
4 v_bu 2 bu
5 v_bu 2 bu
See here in the docs. Indexing with [1]
means you access everything after the first underscore, up to the second. If there is no underscore, NaN
is returned for that row.
Upvotes: 3