shbfy
shbfy

Reputation: 2135

Create a JSON string from column values

How do I mutate a Pandas DataFrame with a series of dictionaries.

Given the following DataFrame:

data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])

# add dict series
df = df.assign(my_dict="{}")
df.my_dict = df.my_dict.apply(json.loads)
Name Age my_dict
tom 10 {}
nick 15 {}
juli 14 {}

How would I operate on column my_dict and mutate it as follows:

Age > 10

Name Age my_dict
tom 10 {"age>10": false}
nick 15 {"age>10": true}
juli 14 {"age>10": true}

And then mutate again:

Name = "tom":

Name Age my_dict
tom 10 {"age>10": false, "name=tom": true}
nick 15 {"age>10": true, "name=tom", false}
juli 14 {"age>10": true, "name=tom", false}

I'm interested in the process of mutating the dictionary, the rules are arbitrary examples.

Upvotes: 0

Views: 831

Answers (2)

butterflyknife
butterflyknife

Reputation: 1574

apply is generally supposed to be slow. Here are two alternatives, both using list comprehensions, which according to this highly voted answer, is slightly faster than apply.

import pandas as pd
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])

# Define your weird func: takes a row of df and returns your dict
def weird_func2(row):
    return {"name=tom":row["Name"]=="tom", "age>10":row["Age"]>10}

# add dict series
df["mydict"] = [weird_func2(i[1]) for i in df.iterrows()]
df

Or you can try:

import pandas as pd
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])

# Define your weird func: takes a row of df and returns your dict
def weird_func(name, age):
    return {"name=tom":name=="tom", "age>10":age>10}

# add dict series
df["mydict"] = [weird_func(name, age) for name, age in zip(df["Name"], df["Age"])]
df

Upvotes: 1

Corralien
Corralien

Reputation: 120409

You can use:

df['my_dict'] = df.apply(lambda x: x['my_dict'] | {'Age': x['Age'] > 10}, axis=1)
print(df)

# Output
   Name  Age         my_dict
0   tom   10  {'Age': False}
1  nick   15   {'Age': True}
2  juli   14   {'Age': True}

Add a new condition:

df['my_dict'] = df.apply(lambda x: x['my_dict'] | {'Name': x['Name'] == 'tom'}, axis=1)
print(df)

# Output
   Name  Age                       my_dict
0   tom   10  {'Age': False, 'Name': True}
1  nick   15  {'Age': True, 'Name': False}
2  juli   14  {'Age': True, 'Name': False}

Obviously if you want to convert to json, use:

>>> df['my_dict'].apply(json.dumps)
0    {"Age": false, "Name": true}
1    {"Age": true, "Name": false}
2    {"Age": true, "Name": false}
Name: my_dict, dtype: object

Upvotes: 1

Related Questions