HackerMan94
HackerMan94

Reputation: 3

How to create new columns and insert values from cell values in a pandas dataframe

I have a dataframe that is in the following format:

    geo_locations   feature                                             mau_audience
0   Aabenraa        Alcholic Drinks,Android users,Architecture,Art...   3.380211241711606,3.230448921378274,3.0,3.5910...
1   Aalborg         Alcholic Drinks,Android users,Architecture,Art...   4.113943352306837,3.8920946026904804,3.7853298...
2   Assens          Alcholic Drinks,Android users,Architecture,Art...   3.041392685158225,3.0,3.0,3.255272505103306,3....
3   Billund         Alcholic Drinks,Android users,Architecture,Art...   3.0,3.0,3.0,3.1760912590556813,3.1461280356782...
4   Bornholm        Alcholic Drinks,Android users,Architecture,Art...   3.0,3.0,3.0,3.113943352306837,3.04139268515822...
5   Esbjerg         Alcholic Drinks,Android users,Architecture,Art...   3.792391689498254,3.5563025007672873,3.5314789...

It is very annoying to work with this format, as I have to look at the indexes of the cells to get the mau_audience value for a feature in a given location. Instead of this format, it would be a lot easier to work with it in the following format:

    geo_locations   Alcoholic Drinks     Android Users      Architecture 
0   Aabenraa        3.380211241711606    3.230448921378274. 3.0              
1   Aalborg         4.113943352306837.   3.8920946026904804 3.7853298
2   Assens          3.041392685158225    3.0.               3.0
3   Billund         3.0                  3.0,               3.0
4   Bornholm        3.0                  3.0                3.0
5   Esbjerg         3.792391689498254    3.5563025007672873 3.5314789

How would I go about reformating this dataframe? Thanks!

Upvotes: 0

Views: 48

Answers (2)

Jason Cook
Jason Cook

Reputation: 1511

Assuming that every comma separated value in feature has a corresponding value in mau_audience.

Create DataFrame

data = pd.DataFrame([
['Aabenraa','Alcholic Drinks,Android users,Architecture', '3.380211241711606,3.230448921378274,3.0'],
['Aalborg','Alcholic Drinks,Android users,Architecture','4.113943352306837,3.8920946026904804,3.7853298']], 
columns = ['geo_locations','feature','mau_audience'])

Column Names

columns = data.loc[0,'feature'].split(',')

Assumed from first record for feature.

Split mau_audience & Convert Datatypes

new = data['mau_audience'].str.split(',', expand=True)
new.columns = columns
new = new.astype('float')

Merge New DataFrame on Index

data = data[['geo_locations']].merge(new,left_index=True, right_index=True)

Keeping only geo_locations from the original DataFrame.

Result

  geo_locations  Alcholic Drinks  Android users  Architecture
  0      Aabenraa         3.380211       3.230449       3.00000
  1       Aalborg         4.113943       3.892095       3.78533

Upvotes: 2

Anant Kumar
Anant Kumar

Reputation: 641

  1. The data you've shown is similar to the below example :-
import pandas as pd
df=pd.DataFrame({"A":["a,b,c","a,b,d"],"B":["1,2,3","3,4,5"],"C":['a','b']})
  1. Convert the string values in each cell of the corresponding columns ("A","B" in my case) to lists
df.A=df.A.apply(lambda x : x.split(','))
df.B=df.B.apply(lambda x : x.split(','))
  1. Convert these list of strings to dictionary and then concatenate the dataframes
final_df=df[['C']]
temp_df = df.apply(lambda row: dict(zip(row["A"], row["B"])), axis=1)

temp_df would be of this format

0    {'a': '1', 'b': '2', 'c': '3'}
1    {'a': '3', 'b': '4', 'd': '5'}
  1. Convert the above to a DataFrame such that the Keys are Columns of the DataFrame
temp_df_1=temp_df.apply(pd.Series)
final_df=pd.concat([final_df, temp_df_1], axis=1)

final_df

   C  a  b    c      d
0  a  1  2    3     NaN
1  b  3  4   NaN     5

Upvotes: 1

Related Questions