Reputation: 2135
How do I mutate a Pandas DataFrame with a series of dictionaries.
Given the following DataFrame:
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# add dict series
df = df.assign(my_dict="{}")
df.my_dict = df.my_dict.apply(json.loads)
Name | Age | my_dict |
---|---|---|
tom | 10 | {} |
nick | 15 | {} |
juli | 14 | {} |
How would I operate on column my_dict
and mutate it as follows:
Age > 10
Name | Age | my_dict |
---|---|---|
tom | 10 | {"age>10": false} |
nick | 15 | {"age>10": true} |
juli | 14 | {"age>10": true} |
And then mutate again:
Name = "tom":
Name | Age | my_dict |
---|---|---|
tom | 10 | {"age>10": false, "name=tom": true} |
nick | 15 | {"age>10": true, "name=tom", false} |
juli | 14 | {"age>10": true, "name=tom", false} |
I'm interested in the process of mutating the dictionary, the rules are arbitrary examples.
Upvotes: 0
Views: 831
Reputation: 1574
apply
is generally supposed to be slow. Here are two alternatives, both using list comprehensions, which according to this highly voted answer, is slightly faster than apply.
import pandas as pd
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# Define your weird func: takes a row of df and returns your dict
def weird_func2(row):
return {"name=tom":row["Name"]=="tom", "age>10":row["Age"]>10}
# add dict series
df["mydict"] = [weird_func2(i[1]) for i in df.iterrows()]
df
Or you can try:
import pandas as pd
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# Define your weird func: takes a row of df and returns your dict
def weird_func(name, age):
return {"name=tom":name=="tom", "age>10":age>10}
# add dict series
df["mydict"] = [weird_func(name, age) for name, age in zip(df["Name"], df["Age"])]
df
Upvotes: 1
Reputation: 120409
You can use:
df['my_dict'] = df.apply(lambda x: x['my_dict'] | {'Age': x['Age'] > 10}, axis=1)
print(df)
# Output
Name Age my_dict
0 tom 10 {'Age': False}
1 nick 15 {'Age': True}
2 juli 14 {'Age': True}
Add a new condition:
df['my_dict'] = df.apply(lambda x: x['my_dict'] | {'Name': x['Name'] == 'tom'}, axis=1)
print(df)
# Output
Name Age my_dict
0 tom 10 {'Age': False, 'Name': True}
1 nick 15 {'Age': True, 'Name': False}
2 juli 14 {'Age': True, 'Name': False}
Obviously if you want to convert to json, use:
>>> df['my_dict'].apply(json.dumps)
0 {"Age": false, "Name": true}
1 {"Age": true, "Name": false}
2 {"Age": true, "Name": false}
Name: my_dict, dtype: object
Upvotes: 1