Luiz Paulo Coutinho
Luiz Paulo Coutinho

Reputation: 3

How to get data in a column that looks like a dictionary?

I have the DF below and in the address column there is a string that looks like a dictionary.

I'm trying to extract the data and turn it into new columns, but I can't.

df['address'][0]

"{'district': 'Centro', 'city': 'SEROPEDICA', 'state': 'RJ', 'country': 'BR', 'latitude': -22.744438, 'longitude': -43.70285, 'timezone': 'Etc/GMT+3', 'zipCode': '23895005', 'streetName': 'Avenida Ministro Fernando Costa', 'streetNumber': '547'}"

type(df.loc[0,'address'])

str

I try:

newdf=pd.DataFrame(data=df['address'].tolist())
newdf=pd.concat((df,newdf),axis=1)

address_df = df.pop("address")
new_df = pd.concat([df, address_df.apply(pd.Series)], axis = 1)

pd.concat([df.drop(columns='Adress'),  pd.json_normalize(df['Adress'])], axis=1) 

nothing thats work

enter image description here

Upvotes: 0

Views: 53

Answers (2)

Inputvector
Inputvector

Reputation: 1093

Here is a solution:

I created a sample dataset:

dic = {'district': 'Centro', 'city': 'SEROPEDICA', 'state': 'RJ', 'country': 'BR', 'latitude': -22.744438, 'longitude': -43.70285, 'timezone': 'Etc/GMT+3', 'zipCode': '23895005', 'streetName': 'Avenida Ministro Fernando Costa', 'streetNumber': '547'}
df = pd.DataFrame({'A': np.random.randint(10, size=10), 
                   'B': np.random.randint(10, size=10), 
                   'address' : str(dic) })

Looks like:

    A   B   address
0   0   6   {'district': 'Centro', 'city': 'SEROPEDICA', '...
1   4   5   {'district': 'Centro', 'city': 'SEROPEDICA', '...
2   9   9   {'district': 'Centro', 'city': 'SEROPEDICA', '...
3   8   7   {'district': 'Centro', 'city': 'SEROPEDICA', '...
4   4   6   {'district': 'Centro', 'city': 'SEROPEDICA', '...
5   2   4   {'district': 'Centro', 'city': 'SEROPEDICA', '...
6   6   9   {'district': 'Centro', 'city': 'SEROPEDICA', '...
7   9   5   {'district': 'Centro', 'city': 'SEROPEDICA', '...
8   5   0   {'district': 'Centro', 'city': 'SEROPEDICA', '...
9   6   4   {'district': 'Centro', 'city': 'SEROPEDICA', '...

I created a temp dataframe with dictionary values:

df_temp = df.apply(lambda row : pd.Series(eval(row['address'])) , axis=1) 
df= pd.concat([df, df_temp], axis=1).drop(['address'],axis=1)

Final Dataset:

    A   B   district city    state  country latitude    longitude   timezone    zipCode streetName  streetNumber
0   5   5   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547
1   6   8   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547
2   4   7   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547
3   8   7   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547
4   1   9   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547
5   1   2   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547
6   0   2   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547
7   4   8   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547
8   0   7   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547
9   1   7   Centro  SEROPEDICA  RJ  BR  -22.744438  -43.70285   Etc/GMT+3   23895005    Avenida Ministro Fernando Costa 547

Note: In this case address column has the same value in every row. I assume, in your original data has different dictionaries in every row.

Upvotes: 1

CFreitas
CFreitas

Reputation: 1775

You can use pd.json_normalize:

df = df.join(pd.json_normalize(df.address))

Upvotes: 0

Related Questions