Reputation: 1463
I have a data frame given as follows;
temp2= pd.DataFrame({'id':[0,1,2,3],'participant_type_dic':[{'0': 'Victim', '1': 'Victim'},\
{'0': 'Victim','1': 'Suspect'},\
{'0': 'Victim'},\
{'0': 'Victim', '1': 'Victim'}],\
'gun_stolen':['yes','no','yes','yes'],
'participant_age_category':[{'0': 'Adult', '1': 'Adult'},\
{'0': 'Adult','1': 'Teen'},\
{'0': 'Adult'},\
{'0': 'Adult', '1': 'Teen'}]})
id | participant_type_dic | gun_stolen | participant_age_category
| | |
0 | {'0':'Victim','1':'Victim'} | yes | {'0':'Adult','1':'Adult'}
| | |
1 | {'0':'Victim','1':'Suspect'} | no | {'0':'Adult','1':'Teen'}
| | |
2 | {'0':'Victim'} | yes | {'0': 'Adult'}
| | |
| | |
3 | {'0': 'Victim', '1': 'Victim'} | yes | {'0': 'Adult', '1': 'Teen'}
This data frame has 4 columns for simplicity. Here, two columns are of particular interest; participant_type_dic
, and participant_age_category
. An entry from column participant_type_dic
is a dictionary of following form; {'0': 'Victim', '1': 'Victim'}
, which implies that for key value of 0, that person is a victim. Similarly, in the column participant_age_category
, the key value of 0 is actually an adult, see {'0': 'Adult', '1': 'Adult'}
.
Therefore, in the first row, there are 2 victims, and all of those victims are adults. Similarly in the second row, there is one adult victim, and there is one teen suspect. The goal is to get the count of adult victims. Hence, we desire the following output;
Desired Output
id | participant_type_dic | gun_stolen | participant_age_category | adult_victim
-------------------------------------------------------------------------------------------------
| | | |
0 | {'0':'Victim','1':'Victim'} | yes | {'0':'Adult','1':'Adult'} | 2
| | | |
1 | {'0':'Victim','1':'Suspect'} | no | {'0':'Adult','1':'Teen'} | 1
| | | |
2 | {'0':'Victim'} | yes | {'0': 'Adult'} | 1
| | | |
| | | |
3 | {'0': 'Victim', '1': 'Victim'} | yes | {'0': 'Adult', '1': 'Teen'}| 1
I was able to come up with following idea to get count of adult_victims;
from collections import defaultdict
d=defaultdict(int)
for k in temp2.iterrows():
for j in k[1][1].keys():
str1=k[1][1][j]
str2=k[1][3][j]
s=str1+'-'+str2
d[s]+=1
print(d)
This gives following output;
defaultdict(<class 'int'>, {'Victim-Adult': 1})
defaultdict(<class 'int'>, {'Victim-Adult': 2})
defaultdict(<class 'int'>, {'Victim-Adult': 3})
defaultdict(<class 'int'>, {'Victim-Adult': 3, 'Suspect-Teen': 1})
defaultdict(<class 'int'>, {'Victim-Adult': 4, 'Suspect-Teen': 1})
defaultdict(<class 'int'>, {'Victim-Adult': 5, 'Suspect-Teen': 1})
defaultdict(<class 'int'>, {'Victim-Adult': 5, 'Suspect-Teen': 1, 'Victim-Teen': 1})
But this is not exactly the code we want here. Actually, I am looking for a function which can be applied on the data frame and obtain desired output. Help is appreciated.
Upvotes: 0
Views: 24
Reputation: 8508
You can also sum the values of the dictionary using this.
temp2['adults'] = temp2['participant_age_category'].apply(lambda x: sum(v == 'Adult' for v in x.values()))
Here .apply will iterate through each row. For each row, you are picking the values of the dict and checking if they are equal to Adult
. If yes, a value of True (or 1) is sent. You then sum up all the 1s to get the count. If the check is False, a value of 0 is sent.
Note: This does not take into account participant_type_dic
whether the individual is a victim or suspect. If you want to use both, then @Chris answer will be best.
Alternate, you can zip both and check like this:
temp2['adults'] = temp2.apply(lambda x: sum(v ==('Victim','Adult') for v in zip(x['participant_type_dic'].values(),x['participant_age_category'].values())), axis=1)
This will provide you a result of:
id participant_type_dic ... participant_age_category adults
0 0 {'0': 'Victim', '1': 'Victim'} ... {'0': 'Adult', '1': 'Adult'} 2
1 1 {'0': 'Victim', '1': 'Suspect'} ... {'0': 'Adult', '1': 'Teen'} 1
2 2 {'0': 'Victim'} ... {'0': 'Adult'} 1
3 3 {'0': 'Victim', '1': 'Victim'} ... {'0': 'Adult', '1': 'Teen'} 1
Upvotes: 0
Reputation: 29742
One way using pandas.DataFrame.apply
with custom function:
def adult_victim(ser):
cnt = 0
for k, v in ser["participant_age_category"].items():
if (v, ser["participant_type_dic"][k]) == ("Adult", "Victim"):
cnt += 1
return cnt
temp2["adult_victim"] = temp2.apply(adult_victim, axis=1)
print(temp2)
Output:
gun_stolen id participant_age_category \
0 yes 0 {'1': 'Adult', '0': 'Adult'}
1 no 1 {'1': 'Teen', '0': 'Adult'}
2 yes 2 {'0': 'Adult'}
3 yes 3 {'1': 'Teen', '0': 'Adult'}
participant_type_dic adult_victim
0 {'1': 'Victim', '0': 'Victim'} 2
1 {'1': 'Suspect', '0': 'Victim'} 1
2 {'0': 'Victim'} 1
3 {'1': 'Victim', '0': 'Victim'} 1
Upvotes: 1