Defining a column based on concating values from other columns

Question

I have a data frame given as follows;

temp2= pd.DataFrame({'id':[0,1,2,3],'participant_type_dic':[{'0': 'Victim', '1': 'Victim'},\
                                                        {'0': 'Victim','1': 'Suspect'},\
                                                       {'0': 'Victim'},\
                                                       {'0': 'Victim', '1': 'Victim'}],\
                 'gun_stolen':['yes','no','yes','yes'],
                 'participant_age_category':[{'0': 'Adult', '1': 'Adult'},\
                                            {'0': 'Adult','1': 'Teen'},\
                                            {'0': 'Adult'},\
                                            {'0': 'Adult', '1': 'Teen'}]})


    id  |      participant_type_dic       | gun_stolen  |   participant_age_category
        |                                 |             |
  0     | {'0':'Victim','1':'Victim'}     | yes         |   {'0':'Adult','1':'Adult'}
        |                                 |             |
  1     | {'0':'Victim','1':'Suspect'}    | no          |   {'0':'Adult','1':'Teen'}
        |                                 |             |
  2     | {'0':'Victim'}                  | yes         |   {'0': 'Adult'}
        |                                 |             |
        |                                 |             |
  3     |  {'0': 'Victim', '1': 'Victim'} |  yes        |   {'0': 'Adult', '1': 'Teen'}

This data frame has 4 columns for simplicity. Here, two columns are of particular interest; participant_type_dic, and participant_age_category. An entry from column participant_type_dic is a dictionary of following form; {'0': 'Victim', '1': 'Victim'}, which implies that for key value of 0, that person is a victim. Similarly, in the column participant_age_category, the key value of 0 is actually an adult, see {'0': 'Adult', '1': 'Adult'}.

Therefore, in the first row, there are 2 victims, and all of those victims are adults. Similarly in the second row, there is one adult victim, and there is one teen suspect. The goal is to get the count of adult victims. Hence, we desire the following output;

Desired Output

id      |      participant_type_dic       | gun_stolen  |   participant_age_category   | adult_victim
-------------------------------------------------------------------------------------------------
        |                                 |             |                              |
  0     | {'0':'Victim','1':'Victim'}     | yes         |   {'0':'Adult','1':'Adult'}  |  2
        |                                 |             |                              |
  1     | {'0':'Victim','1':'Suspect'}    | no          |   {'0':'Adult','1':'Teen'}   |  1
        |                                 |             |                              |
  2     | {'0':'Victim'}                  | yes         |   {'0': 'Adult'}             |  1
        |                                 |             |                              |
        |                                 |             |                              |
  3     |  {'0': 'Victim', '1': 'Victim'} |  yes        |   {'0': 'Adult', '1': 'Teen'}|  1

I was able to come up with following idea to get count of adult_victims;

from collections import defaultdict
d=defaultdict(int)
for k in temp2.iterrows():
    for j in k[1][1].keys():
        str1=k[1][1][j]
        str2=k[1][3][j]
        s=str1+'-'+str2
        d[s]+=1
        print(d)

This gives following output;

defaultdict(, {'Victim-Adult': 1})
defaultdict(, {'Victim-Adult': 2})
defaultdict(, {'Victim-Adult': 3})
defaultdict(, {'Victim-Adult': 3, 'Suspect-Teen': 1})
defaultdict(, {'Victim-Adult': 4, 'Suspect-Teen': 1})
defaultdict(, {'Victim-Adult': 5, 'Suspect-Teen': 1})
defaultdict(, {'Victim-Adult': 5, 'Suspect-Teen': 1, 'Victim-Teen': 1})

But this is not exactly the code we want here. Actually, I am looking for a function which can be applied on the data frame and obtain desired output. Help is appreciated.

Chris · Accepted Answer

One way using pandas.DataFrame.apply with custom function:

def adult_victim(ser):
    cnt = 0
    for k, v in ser["participant_age_category"].items():
        if (v, ser["participant_type_dic"][k]) == ("Adult", "Victim"):
            cnt += 1
    return cnt

temp2["adult_victim"] = temp2.apply(adult_victim, axis=1)
print(temp2)

Output:

  gun_stolen  id      participant_age_category  \
0        yes   0  {'1': 'Adult', '0': 'Adult'}   
1         no   1   {'1': 'Teen', '0': 'Adult'}   
2        yes   2                {'0': 'Adult'}   
3        yes   3   {'1': 'Teen', '0': 'Adult'}   

              participant_type_dic  adult_victim  
0   {'1': 'Victim', '0': 'Victim'}             2  
1  {'1': 'Suspect', '0': 'Victim'}             1  
2                  {'0': 'Victim'}             1  
3   {'1': 'Victim', '0': 'Victim'}             1

Defining a column based on concating values from other columns

Answers (2)

Related Questions