Reputation: 81
Category Data Age
A1 30 {'Age1_Set': 25.6, 'WIndex': 343.3, 'Age2_Set': 22.6}
A2 20 {'Age1_Set': 35.2, 'WIndex': 343.3, 'Age2_Set': 42.1}
A3 20 {'Age1_Set': 26.5, 'WIndex': 343.3, 'Age2_Set': 11.1}
My pandas df is like above. Age should be the min of 'Age1_Set' and 'Age2_Set'. I am able to deal with this if Age was not a dict and rather three columns.
My output is:
Category Data Age
A1 30 22.6
A2 20 35.2
A3 20 11.1
How can this be done?
EDIT: I have another entry in original df
Category Data Age
A4 20
Age is null basically. How to deal with such case.
Upvotes: 2
Views: 60
Reputation: 1688
If you only need to compare these two values of the dict, you can use the following method:
df['Age']=df['Age'].apply(lambda x: min(x["Age1_Set"], x["Age2_Set"]))
This avoids the hassle of converting the dictionary to other data formats and is IMO easily readable.
If you need to more complex logic, then you might want to extract the lambda into its own function. For example, when you need to deal with entries that are None:
def get_min_age(entry:dict)->float:
if entry is None:
return 0
else:
return min(entry["age1"], entry["age2"])
df['Age']=df['Age'].apply(get_min_age)
Using .apply()
is not an ideal approach when dealing with large datasets (>100k rows), since it treats every row in a sequential way.
If you notice that performance becomes a bottleneck, the vectorized methods in the other answers will be more performant (as this article explains in more detail).
Upvotes: 2
Reputation: 4827
You can try:
df = pd.DataFrame({"Category": ["A1", "A2", "A3"],
"Data": [30, 20, 20],
"Age": [{'Age1_Set': 25.6, 'WIndex': 343.3, 'Age2_Set': 22.6}, {'Age1_Set': 35.2, 'WIndex': 343.3, 'Age2_Set': 42.1}, {'Age1_Set': 26.5, 'WIndex': 343.3, 'Age2_Set': 11.1}]})
df["Min_Age"] = [min(x["Age1_Set"], x["Age2_Set"]) for x in df.Age]
print(df)
Result:
Category Data Age Min_Age
0 A1 30 {'Age1_Set': 25.6, 'WIndex': 343.3, 'Age2_Set'... 22.6
1 A2 20 {'Age1_Set': 35.2, 'WIndex': 343.3, 'Age2_Set'... 35.2
2 A3 20 {'Age1_Set': 26.5, 'WIndex': 343.3, 'Age2_Set'... 11.1
Upvotes: 1
Reputation: 71689
Create a new dataframe from the dictionaries in the Age
column, then filter
the Age
like columns and use min
along axis=1
to get the minimum of age from Age1_Set
and Age2_Set
:
df['Age'] = pd.DataFrame(df['Age'].tolist()).filter(like='Age').min(1)
Alternatively you can also use Series.str.get
+ np.minimum
:
df['Age'] = np.minimum(df['Age'].str.get('Age1_Set'), df['Age'].str.get('Age2_Set'))
Category Data Age
0 A1 30 22.6
1 A2 20 35.2
2 A3 20 11.1
Upvotes: 3
Reputation: 377
You could just pass your data column without any extra params.
import pandas as pd
df2 = pd.json_normalize(df['Age'])
Upvotes: 0