Reputation: 3
I have the DF below and in the address column there is a string that looks like a dictionary.
I'm trying to extract the data and turn it into new columns, but I can't.
df['address'][0]
"{'district': 'Centro', 'city': 'SEROPEDICA', 'state': 'RJ', 'country': 'BR', 'latitude': -22.744438, 'longitude': -43.70285, 'timezone': 'Etc/GMT+3', 'zipCode': '23895005', 'streetName': 'Avenida Ministro Fernando Costa', 'streetNumber': '547'}"
type(df.loc[0,'address'])
str
I try:
newdf=pd.DataFrame(data=df['address'].tolist())
newdf=pd.concat((df,newdf),axis=1)
address_df = df.pop("address")
new_df = pd.concat([df, address_df.apply(pd.Series)], axis = 1)
pd.concat([df.drop(columns='Adress'), pd.json_normalize(df['Adress'])], axis=1)
nothing thats work
Upvotes: 0
Views: 53
Reputation: 1093
Here is a solution:
I created a sample dataset:
dic = {'district': 'Centro', 'city': 'SEROPEDICA', 'state': 'RJ', 'country': 'BR', 'latitude': -22.744438, 'longitude': -43.70285, 'timezone': 'Etc/GMT+3', 'zipCode': '23895005', 'streetName': 'Avenida Ministro Fernando Costa', 'streetNumber': '547'}
df = pd.DataFrame({'A': np.random.randint(10, size=10),
'B': np.random.randint(10, size=10),
'address' : str(dic) })
Looks like:
A B address
0 0 6 {'district': 'Centro', 'city': 'SEROPEDICA', '...
1 4 5 {'district': 'Centro', 'city': 'SEROPEDICA', '...
2 9 9 {'district': 'Centro', 'city': 'SEROPEDICA', '...
3 8 7 {'district': 'Centro', 'city': 'SEROPEDICA', '...
4 4 6 {'district': 'Centro', 'city': 'SEROPEDICA', '...
5 2 4 {'district': 'Centro', 'city': 'SEROPEDICA', '...
6 6 9 {'district': 'Centro', 'city': 'SEROPEDICA', '...
7 9 5 {'district': 'Centro', 'city': 'SEROPEDICA', '...
8 5 0 {'district': 'Centro', 'city': 'SEROPEDICA', '...
9 6 4 {'district': 'Centro', 'city': 'SEROPEDICA', '...
I created a temp dataframe with dictionary values:
df_temp = df.apply(lambda row : pd.Series(eval(row['address'])) , axis=1)
df= pd.concat([df, df_temp], axis=1).drop(['address'],axis=1)
Final Dataset:
A B district city state country latitude longitude timezone zipCode streetName streetNumber
0 5 5 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
1 6 8 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
2 4 7 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
3 8 7 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
4 1 9 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
5 1 2 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
6 0 2 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
7 4 8 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
8 0 7 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
9 1 7 Centro SEROPEDICA RJ BR -22.744438 -43.70285 Etc/GMT+3 23895005 Avenida Ministro Fernando Costa 547
Note: In this case address
column has the same value in every row. I assume, in your original data has different dictionaries in every row.
Upvotes: 1
Reputation: 1775
You can use pd.json_normalize
:
df = df.join(pd.json_normalize(df.address))
Upvotes: 0