DevLeb2022
DevLeb2022

Reputation: 685

how to split values in columns using dataframe?

I have a dataframe that is consist of 3 columns where one of these columns includes as values 2 values separated with - in one record.

I want to another columns that includes all these values but with one value each record

For this I created a function but when i run the code it crash and display the below error:

KeyError: "['loc1-loc2'] not found in axis"
Traceback:
File "f:\aienv\lib\site-packages\streamlit\script_runner.py", line 333, in _run_script
    exec(code, module.__dict__)
File "F:\AIenv\streamlit\app.py", line 346, in <module>
    df[y] = df[y].apply(splitting)
File "f:\aienv\lib\site-packages\pandas\core\series.py", line 4200, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\lib.pyx", line 2401, in pandas._libs.lib.map_infer
File "F:\AIenv\streamlit\app.py", line 295, in splitting
    df1=df.drop(record,axis=1).join(record.str.split("-",expand=True).stack().reset_index(level=1,drop=True).rename("spltting"))
File "f:\aienv\lib\site-packages\pandas\core\frame.py", line 4169, in drop
    errors=errors,
File "f:\aienv\lib\site-packages\pandas\core\generic.py", line 3884, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "f:\aienv\lib\site-packages\pandas\core\generic.py", line 3918, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
File "f:\aienv\lib\site-packages\pandas\core\indexes\base.py", line 5278, in drop
    raise KeyError(f"{labels[mask]} not found in axis")

Code:

import numpy as np
import pandas as pd

df =pd.DataFrame({
            "source_number":                        [ 
             [11199,11328,11287,32345,12342,1232,13456,123244,13456],
             "location":          
             ["loc2","loc1-loc3","loc3","loc1","loc2-loc1","loc2","loc3-loc2","loc2","loc1"],
              "category": 
             ["cat1","cat2","cat1","cat3","cat3","cat3","cat2","cat3","cat2"],
             })    

def splitting(record):
    df1=df.drop(record,axis=1).join(record.str.split("-",expand=True).stack().reset_index(level=1,drop=True).rename("spltting"))
    return df1
for y in df.columns:
      df[y] = df[y].apply(splitting)

Upvotes: 1

Views: 143

Answers (1)

jezrael
jezrael

Reputation: 862511

I think you need DataFrame.explode with split column passed to function:

def splitting(df, r):

    df[r] = df[r].str.split("-")
    return df.explode(r)

df = splitting(df, 'location')
print (df)
   source_number location category
0          11199     loc2     cat1
1          11328     loc1     cat2
1          11328     loc3     cat2
2          11287     loc3     cat1
3          32345     loc1     cat3
4          12342     loc2     cat3
4          12342     loc1     cat3
5           1232     loc2     cat3
6          13456     loc3     cat2
6          13456     loc2     cat2
7         123244     loc2     cat3
8          13456     loc1     cat2

Upvotes: 3

Related Questions