Reputation: 403

unable to split column in pandas dataframe

I have a dataframe in which one of the column has multiple values, See dummy dataset below

import pandas as pd
data = pd.DataFrame({"A" : ["EmpA","EmpB","EmpC","EmpD","EmpI"], 
                      "Summary" :["XX1","XX2","XX3","XX4","XX5"],
                      "DummyCol" :["TextAAAA","TextAAAA","TextAAAA","TextAAAA","TextAAAA"],
                     "MonthSales" : ['25:50:30','30:40','35:43:45:46:100','40:3','45:32:1:23:90']})

I applied the split function however it results it giving back only the single columns , is there any other efficient way apart from getting the results and using a join to combine the output..

data['MonthSales'].str.split(":",expand=True)

Desired data structure

Upvotes: 1

Answers (3)

ThePyGuy

Reputation: 18416

You can just type cast to the resulting dataframe to float, then rename the columns and finally join it back to the original dataframe.

out=(data.join(data.pop('MonthSales')
                .str.split(":",expand=True).astype(float)
                .rename(columns= lambda i: f"MonthSales{i+1}"))
    )

OUTPUT:

      A Summary  DummyCol  MonthSales1  MonthSales2  MonthSales3  MonthSales4  \
0  EmpA     XX1  TextAAAA         25.0         50.0         30.0          NaN   
1  EmpB     XX2  TextAAAA         30.0         40.0          NaN          NaN   
2  EmpC     XX3  TextAAAA         35.0         43.0         45.0         46.0   
3  EmpD     XX4  TextAAAA         40.0          3.0          NaN          NaN   
4  EmpI     XX5  TextAAAA         45.0         32.0          1.0         23.0   
   MonthSales5  
0          NaN  
1          NaN  
2        100.0  
3          NaN  
4         90.0

Upvotes: 1

Adeyemi Michael

Reputation: 31

Trying using Series.str.split() functions.

# import Pandas as pd
import pandas as pd

Split Name column into “First” and “Last” column respectively and add it to the existing Dataframe .

# create a new data frame
df = pd.DataFrame({'Name': ['John Larter', 'Robert Junior', 'Jonny Depp'],
                    'Age':[32, 34, 36]})

print("Given Dataframe is :\n",df)

# Adding two new columns to the existing dataframe.
# bydefault splitting is done on the basis of single space.
df[['First','Last']] = df.Name.str.split(expand=True)

print("\n After adding two new columns : \n", df)

Output:

Upvotes: 0

I'mahdi

Reputation: 24049

You can use pands.join and pandas.add_prefix.

>>> data.join(pd.DataFrame(data['MonthSales'].str.split(':', expand=True)
                      ).rename(columns=dict(zip(range(5), range(1,6)))
                              ).add_prefix('MonthSales')
         ).drop(columns='MonthSales').fillna(np.nan)



      A Summary  DummyCol MonthSales1 MonthSales2 MonthSales3 MonthSales4  \
0  EmpA     XX1  TextAAAA          25          50          30         NaN   
1  EmpB     XX2  TextAAAA          30          40         NaN         NaN   
2  EmpC     XX3  TextAAAA          35          43          45          46   
3  EmpD     XX4  TextAAAA          40           3         NaN         NaN   
4  EmpI     XX5  TextAAAA          45          32           1          23   

  MonthSales5  
0         NaN  
1         NaN  
2         100  
3         NaN  
4          90

Upvotes: 2

unable to split column in pandas dataframe

Answers (3)

Related Questions