Reputation: 271
I have a pandas data frame with column that contain house numbers and a suffix. The house number and suffix are seperated by a '-' although many suffixes also contain a '-'.
I have tried this:
df house_nr(x):
y = x['house_nr'].split('-', maxsplit = 1)
return y
df['suffix'] = df.apply(house_nr, axis=1)
Got the following error:
KeyError: ('house_nr', 'occurred at index 0')
After some other attempts I've got this working:
df2 = pd.DataFrame(df['house_nr'].str.split('-',1).tolist(),columns = ['house-number','suffix'])
And then I join the dataframe but I don't think this solution is very nice or pythonic.
Upvotes: 3
Views: 965
Reputation: 294516
from numpy.core.defchararray import split
a = df.house_nr.values.astype(str)
pd.DataFrame(
split(a, '-', 1).tolist(),
df.index, ['house-number', 'suffix'])
house-number suffix
0 123 Rd-thing
1 456 House
2 567 House-thing
from numpy.core.defchararray import split
cols = ['house-number', 'suffix']
a = df.house_nr.values.astype(str)
pd.DataFrame(dict(zip(cols, zip(*(split(a, '-', 1))))), df.index)
house-number suffix
0 123 Rd-thing
1 456 House
2 567 House-thing
(I'll give it back)
df = pd.DataFrame({'house_nr': ['123-Rd-thing', '456-House', '567-House-thing']})
Upvotes: 2
Reputation: 51185
Setup
df = pd.DataFrame({'house_nr': ['123-Rd-thing', '456-House', '567-House-thing']})
house_nr
0 123-Rd-thing
1 456-House
2 567-House-thing
Using a list comprehension and split
, which will be faster than pandas string methods:
pd.DataFrame([i.split('-', 1) for i in df.house_nr], columns=['num', 'suffix'])
num suffix
0 123 Rd-thing
1 456 House
2 567 House-thing
Upvotes: 2
Reputation: 51425
use the expand=True
and n=1
arguments to str.split
. expand=True
creates new columns for your split, and n=1
limits the split to the first occurrence of -
>>> df
col
0 5-suffix-1
1 6-suffix-2
df[['house_number','suffix']] = df['col'].str.split('-', n=1, expand=True)
>>> df
col house_number suffix
0 5-suffix-1 5 suffix-1
1 6-suffix-2 6 suffix-2
Upvotes: 1