Reputation: 685
I have a dataframe that is consist of 3 columns where one of these columns includes as values 2 values separated with - in one record.
I want to another columns that includes all these values but with one value each record
For this I created a function but when i run the code it crash and display the below error:
KeyError: "['loc1-loc2'] not found in axis"
Traceback:
File "f:\aienv\lib\site-packages\streamlit\script_runner.py", line 333, in _run_script
exec(code, module.__dict__)
File "F:\AIenv\streamlit\app.py", line 346, in <module>
df[y] = df[y].apply(splitting)
File "f:\aienv\lib\site-packages\pandas\core\series.py", line 4200, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\lib.pyx", line 2401, in pandas._libs.lib.map_infer
File "F:\AIenv\streamlit\app.py", line 295, in splitting
df1=df.drop(record,axis=1).join(record.str.split("-",expand=True).stack().reset_index(level=1,drop=True).rename("spltting"))
File "f:\aienv\lib\site-packages\pandas\core\frame.py", line 4169, in drop
errors=errors,
File "f:\aienv\lib\site-packages\pandas\core\generic.py", line 3884, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "f:\aienv\lib\site-packages\pandas\core\generic.py", line 3918, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "f:\aienv\lib\site-packages\pandas\core\indexes\base.py", line 5278, in drop
raise KeyError(f"{labels[mask]} not found in axis")
Code:
import numpy as np
import pandas as pd
df =pd.DataFrame({
"source_number": [
[11199,11328,11287,32345,12342,1232,13456,123244,13456],
"location":
["loc2","loc1-loc3","loc3","loc1","loc2-loc1","loc2","loc3-loc2","loc2","loc1"],
"category":
["cat1","cat2","cat1","cat3","cat3","cat3","cat2","cat3","cat2"],
})
def splitting(record):
df1=df.drop(record,axis=1).join(record.str.split("-",expand=True).stack().reset_index(level=1,drop=True).rename("spltting"))
return df1
for y in df.columns:
df[y] = df[y].apply(splitting)
Upvotes: 1
Views: 143
Reputation: 862511
I think you need DataFrame.explode
with split column passed to function:
def splitting(df, r):
df[r] = df[r].str.split("-")
return df.explode(r)
df = splitting(df, 'location')
print (df)
source_number location category
0 11199 loc2 cat1
1 11328 loc1 cat2
1 11328 loc3 cat2
2 11287 loc3 cat1
3 32345 loc1 cat3
4 12342 loc2 cat3
4 12342 loc1 cat3
5 1232 loc2 cat3
6 13456 loc3 cat2
6 13456 loc2 cat2
7 123244 loc2 cat3
8 13456 loc1 cat2
Upvotes: 3