Lynn
Lynn

Reputation: 4408

Create new column from values after first underscore within a dataframe

I have a dataframe, df, where I would like to create a new column that contains the values after the first underscore, and dropping these values and keeping the middle, if there are two underscores (essentially maintaining the middle value)

data

    col1        col2
    i_ba        1
    i_sapp      1
    h_sapp_ii   1
    h_sapp_led  1
    v_bu        2
    v_bu        2

desired

col1         col2       type
i_ba         1          ba
i_sapp       1          sapp
h_sapp_ii    1          sapp
h_sapp_led   1          sapp
v_bu         2          bu
v_bu         2          bu

doing

df['col1'] = df['col1'].str.extract(r'\w*?_([^_]*)(?:_)?')

I am just not sure how to create a new column with these values. Any suggestion is appreciated

Upvotes: 1

Views: 971

Answers (1)

Tom
Tom

Reputation: 8800

You can use the pandas string split, as well as pandas string indexing:

>>> df['type'] = df['col1'].str.split('_').str[1]
>>> df
         col1  col2  type
0        i_ba     1    ba
1      i_sapp     1  sapp
2   h_sapp_ii     1  sapp
3  h_sapp_led     1  sapp
4        v_bu     2    bu
5        v_bu     2    bu

See here in the docs. Indexing with [1] means you access everything after the first underscore, up to the second. If there is no underscore, NaN is returned for that row.

Upvotes: 3

Related Questions