How to create new columns and insert values from cell values in a pandas dataframe

Question

I have a dataframe that is in the following format:

    geo_locations   feature                                             mau_audience
0   Aabenraa        Alcholic Drinks,Android users,Architecture,Art...   3.380211241711606,3.230448921378274,3.0,3.5910...
1   Aalborg         Alcholic Drinks,Android users,Architecture,Art...   4.113943352306837,3.8920946026904804,3.7853298...
2   Assens          Alcholic Drinks,Android users,Architecture,Art...   3.041392685158225,3.0,3.0,3.255272505103306,3....
3   Billund         Alcholic Drinks,Android users,Architecture,Art...   3.0,3.0,3.0,3.1760912590556813,3.1461280356782...
4   Bornholm        Alcholic Drinks,Android users,Architecture,Art...   3.0,3.0,3.0,3.113943352306837,3.04139268515822...
5   Esbjerg         Alcholic Drinks,Android users,Architecture,Art...   3.792391689498254,3.5563025007672873,3.5314789...

It is very annoying to work with this format, as I have to look at the indexes of the cells to get the mau_audience value for a feature in a given location. Instead of this format, it would be a lot easier to work with it in the following format:

    geo_locations   Alcoholic Drinks     Android Users      Architecture 
0   Aabenraa        3.380211241711606    3.230448921378274. 3.0              
1   Aalborg         4.113943352306837.   3.8920946026904804 3.7853298
2   Assens          3.041392685158225    3.0.               3.0
3   Billund         3.0                  3.0,               3.0
4   Bornholm        3.0                  3.0                3.0
5   Esbjerg         3.792391689498254    3.5563025007672873 3.5314789

How would I go about reformating this dataframe? Thanks!

Anant Kumar · Accepted Answer

The data you've shown is similar to the below example :-

import pandas as pd
df=pd.DataFrame({"A":["a,b,c","a,b,d"],"B":["1,2,3","3,4,5"],"C":['a','b']})

Convert the string values in each cell of the corresponding columns ("A","B" in my case) to lists

df.A=df.A.apply(lambda x : x.split(','))
df.B=df.B.apply(lambda x : x.split(','))

Convert these list of strings to dictionary and then concatenate the dataframes

final_df=df[['C']]
temp_df = df.apply(lambda row: dict(zip(row["A"], row["B"])), axis=1)

temp_df would be of this format

0    {'a': '1', 'b': '2', 'c': '3'}
1    {'a': '3', 'b': '4', 'd': '5'}

Convert the above to a DataFrame such that the Keys are Columns of the DataFrame

temp_df_1=temp_df.apply(pd.Series)
final_df=pd.concat([final_df, temp_df_1], axis=1)

final_df

   C  a  b    c      d
0  a  1  2    3     NaN
1  b  3  4   NaN     5

How to create new columns and insert values from cell values in a pandas dataframe

Answers (2)

Create DataFrame

Column Names

Split mau_audience & Convert Datatypes

Merge New DataFrame on Index

Result

Related Questions