Reputation: 1184
I want to split a column which an address to a specific column like cities and province.
I have a data frame which looks like:
df:
+----------------------------------------------------------------------------------------------------------+
|location
+----------------------------------------------------------------------------------------------------------+
| Jl. Raya Pasir Putih No.6, RT.1/RW.6, Pasir Putih, Kec. Sawangan, Kota Depok, Jawa Barat 16519, Indonesia
| Jl. Legenda Wisata, Wanaherang, Kec. Gn. Putri, Bogor, Jawa Barat 16965, Indonesia
| Jl. Blk. C7 No.17, Rangkapan Jaya Baru, Kec. Pancoran Mas, Kota Depok, Jawa Barat 16434, Indonesia
| Jl. Cibuntu Sayuran No.12, Wr. Muncang, Kec. Bandung Kulon, Kota Bandung, Jawa Barat 40211, Indonesia
| 1 KOMP, Jl. Tirtawening No.10, Cisurupan, Kec. Cibiru, Kota Bandung, Jawa Barat 40614, Indonesia
+----------------------------------------------------------------------------------------------------------+
I want to extract that into another column named City and Province
The output may look like:
df:
+-------------+-------------------+------------+
| location | Cities | province |
+-------------+-------------------+------------+
| ..... | Kota Depok | Jawa Barat |
| ..... | Bogor | Jawa Barat |
| ..... | Kota Depok | Jawa Barat |
| ..... | Kota Bandung | Jawa Barat |
| ..... | Kota Bandung | Jawa Barat |
+-------------+------------+-------------------+
I tried using this method:
def extract_city_state(a):
asplit = a.split(",")
city = asplit[-3].split()
state = asplit[-2].split()[0:1]
return city, state
df.join(
df['location'].apply(
lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
)
)
but it returns
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-29-64a945be5d02> in <module>
1 df.join(
2 df['location'].apply(
----> 3 lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
4 )
5 )
~\anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
4043 else:
4044 values = self.astype(object).values
-> 4045 mapped = lib.map_infer(values, f, convert=convert_dtype)
4046
4047 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-29-64a945be5d02> in <lambda>(x)
1 df.join(
2 df['location'].apply(
----> 3 lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
4 )
5 )
<ipython-input-22-f1d63ccd82dc> in extract_city_state(a)
1 def extract_city_state(a):
2 asplit = a.split(",")
----> 3 city = asplit[-3].split()
4 state = asplit[-2].split()[0:1]
5 return city, state
IndexError: list index out of range
How to overcome this?
Upvotes: 3
Views: 118
Reputation: 862511
Use only pandas str
function for avoid error
s if no match values selected by str[]
indexing - first Series.str.split
create Series of lists and last Series.str.rsplit
split only by last space, because n=1
parameter:
s = df['location'].str.split(',')
df['city'] = s.str[-3]
df['province'] = s.str[-2].str.rsplit(n=1).str[0]
print (df)
location city \
0 Jl. Raya Pasir Putih No.6, RT.1/RW.6, Pasir Pu... Kota Depok
1 Jl. Legenda Wisata, Wanaherang, Kec. Gn. Putri... Bogor
2 Jl. Blk. C7 No.17, Rangkapan Jaya Baru, Kec. P... Kota Depok
3 Jl. Cibuntu Sayuran No.12, Wr. Muncang, Kec. B... Kota Bandung
4 1 KOMP, Jl. Tirtawening No.10, Cisurupan, Kec.... Kota Bandung
province
0 Jawa Barat
1 Jawa Barat
2 Jawa Barat
3 Jawa Barat
4 Jawa Barat
Upvotes: 1
Reputation: 301
If you want to keep this as you have it as a function, just storing the lambda function result as a variable before joining it to the df works:
city_state_split = df['location'].apply(
lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
)
df.join(city_state_split)
Upvotes: 1