J-man
J-man

Reputation: 271

Split column on first occurence of '-'

I have a pandas data frame with column that contain house numbers and a suffix. The house number and suffix are seperated by a '-' although many suffixes also contain a '-'.

I have tried this:

df house_nr(x):
    y = x['house_nr'].split('-', maxsplit = 1)
    return y

df['suffix'] = df.apply(house_nr, axis=1)

Got the following error:

KeyError: ('house_nr', 'occurred at index 0')

After some other attempts I've got this working:

df2 = pd.DataFrame(df['house_nr'].str.split('-',1).tolist(),columns = ['house-number','suffix'])

And then I join the dataframe but I don't think this solution is very nice or pythonic.

Upvotes: 3

Views: 965

Answers (3)

piRSquared
piRSquared

Reputation: 294516

Using Numpy's defchararray module

from numpy.core.defchararray import split

a = df.house_nr.values.astype(str)
pd.DataFrame(
    split(a, '-', 1).tolist(),
    df.index, ['house-number', 'suffix'])

  house-number       suffix
0          123     Rd-thing
1          456        House
2          567  House-thing

Same idea with different construction

from numpy.core.defchararray import split

cols = ['house-number', 'suffix']
a = df.house_nr.values.astype(str)
pd.DataFrame(dict(zip(cols, zip(*(split(a, '-', 1))))), df.index)

  house-number       suffix
0          123     Rd-thing
1          456        House
2          567  House-thing

Setup Borrowed From @user3483203

(I'll give it back)

df = pd.DataFrame({'house_nr': ['123-Rd-thing', '456-House', '567-House-thing']})

Upvotes: 2

user3483203
user3483203

Reputation: 51185

Setup

df = pd.DataFrame({'house_nr': ['123-Rd-thing', '456-House', '567-House-thing']})

          house_nr
0     123-Rd-thing
1        456-House
2  567-House-thing

Using a list comprehension and split, which will be faster than pandas string methods:

pd.DataFrame([i.split('-', 1) for i in df.house_nr], columns=['num', 'suffix'])

   num       suffix
0  123     Rd-thing
1  456        House
2  567  House-thing

Upvotes: 2

sacuL
sacuL

Reputation: 51425

use the expand=True and n=1 arguments to str.split. expand=True creates new columns for your split, and n=1 limits the split to the first occurrence of -

>>> df
          col
0  5-suffix-1
1  6-suffix-2

df[['house_number','suffix']] = df['col'].str.split('-', n=1, expand=True)

>>> df
          col house_number    suffix
0  5-suffix-1            5  suffix-1
1  6-suffix-2            6  suffix-2

Upvotes: 1

Related Questions