Reputation: 403
I have a dataframe in which one of the column has multiple values, See dummy dataset below
import pandas as pd
data = pd.DataFrame({"A" : ["EmpA","EmpB","EmpC","EmpD","EmpI"],
"Summary" :["XX1","XX2","XX3","XX4","XX5"],
"DummyCol" :["TextAAAA","TextAAAA","TextAAAA","TextAAAA","TextAAAA"],
"MonthSales" : ['25:50:30','30:40','35:43:45:46:100','40:3','45:32:1:23:90']})
I applied the split function however it results it giving back only the single columns , is there any other efficient way apart from getting the results and using a join to combine the output..
data['MonthSales'].str.split(":",expand=True)
Desired data structure
Upvotes: 1
Views: 327
Reputation: 18416
You can just type cast to the resulting dataframe to float, then rename the columns and finally join it back to the original dataframe.
out=(data.join(data.pop('MonthSales')
.str.split(":",expand=True).astype(float)
.rename(columns= lambda i: f"MonthSales{i+1}"))
)
OUTPUT:
A Summary DummyCol MonthSales1 MonthSales2 MonthSales3 MonthSales4 \
0 EmpA XX1 TextAAAA 25.0 50.0 30.0 NaN
1 EmpB XX2 TextAAAA 30.0 40.0 NaN NaN
2 EmpC XX3 TextAAAA 35.0 43.0 45.0 46.0
3 EmpD XX4 TextAAAA 40.0 3.0 NaN NaN
4 EmpI XX5 TextAAAA 45.0 32.0 1.0 23.0
MonthSales5
0 NaN
1 NaN
2 100.0
3 NaN
4 90.0
Upvotes: 1
Reputation: 31
Trying using Series.str.split() functions.
# import Pandas as pd
import pandas as pd
Split Name column into “First” and “Last” column respectively and add it to the existing Dataframe .
# create a new data frame
df = pd.DataFrame({'Name': ['John Larter', 'Robert Junior', 'Jonny Depp'],
'Age':[32, 34, 36]})
print("Given Dataframe is :\n",df)
# Adding two new columns to the existing dataframe.
# bydefault splitting is done on the basis of single space.
df[['First','Last']] = df.Name.str.split(expand=True)
print("\n After adding two new columns : \n", df)
Output:
Upvotes: 0
Reputation: 24049
You can use pands.join
and pandas.add_prefix
.
>>> data.join(pd.DataFrame(data['MonthSales'].str.split(':', expand=True)
).rename(columns=dict(zip(range(5), range(1,6)))
).add_prefix('MonthSales')
).drop(columns='MonthSales').fillna(np.nan)
A Summary DummyCol MonthSales1 MonthSales2 MonthSales3 MonthSales4 \
0 EmpA XX1 TextAAAA 25 50 30 NaN
1 EmpB XX2 TextAAAA 30 40 NaN NaN
2 EmpC XX3 TextAAAA 35 43 45 46
3 EmpD XX4 TextAAAA 40 3 NaN NaN
4 EmpI XX5 TextAAAA 45 32 1 23
MonthSales5
0 NaN
1 NaN
2 100
3 NaN
4 90
Upvotes: 2