jay
jay

Reputation: 1463

Defining a column based on concating values from other columns

I have a data frame given as follows;

temp2= pd.DataFrame({'id':[0,1,2,3],'participant_type_dic':[{'0': 'Victim', '1': 'Victim'},\
                                                        {'0': 'Victim','1': 'Suspect'},\
                                                       {'0': 'Victim'},\
                                                       {'0': 'Victim', '1': 'Victim'}],\
                 'gun_stolen':['yes','no','yes','yes'],
                 'participant_age_category':[{'0': 'Adult', '1': 'Adult'},\
                                            {'0': 'Adult','1': 'Teen'},\
                                            {'0': 'Adult'},\
                                            {'0': 'Adult', '1': 'Teen'}]})


    id  |      participant_type_dic       | gun_stolen  |   participant_age_category
        |                                 |             |
  0     | {'0':'Victim','1':'Victim'}     | yes         |   {'0':'Adult','1':'Adult'}
        |                                 |             |
  1     | {'0':'Victim','1':'Suspect'}    | no          |   {'0':'Adult','1':'Teen'}
        |                                 |             |
  2     | {'0':'Victim'}                  | yes         |   {'0': 'Adult'}
        |                                 |             |
        |                                 |             |
  3     |  {'0': 'Victim', '1': 'Victim'} |  yes        |   {'0': 'Adult', '1': 'Teen'}

This data frame has 4 columns for simplicity. Here, two columns are of particular interest; participant_type_dic, and participant_age_category. An entry from column participant_type_dic is a dictionary of following form; {'0': 'Victim', '1': 'Victim'}, which implies that for key value of 0, that person is a victim. Similarly, in the column participant_age_category, the key value of 0 is actually an adult, see {'0': 'Adult', '1': 'Adult'}.

Therefore, in the first row, there are 2 victims, and all of those victims are adults. Similarly in the second row, there is one adult victim, and there is one teen suspect. The goal is to get the count of adult victims. Hence, we desire the following output;

Desired Output

id      |      participant_type_dic       | gun_stolen  |   participant_age_category   | adult_victim
-------------------------------------------------------------------------------------------------
        |                                 |             |                              |
  0     | {'0':'Victim','1':'Victim'}     | yes         |   {'0':'Adult','1':'Adult'}  |  2
        |                                 |             |                              |
  1     | {'0':'Victim','1':'Suspect'}    | no          |   {'0':'Adult','1':'Teen'}   |  1
        |                                 |             |                              |
  2     | {'0':'Victim'}                  | yes         |   {'0': 'Adult'}             |  1
        |                                 |             |                              |
        |                                 |             |                              |
  3     |  {'0': 'Victim', '1': 'Victim'} |  yes        |   {'0': 'Adult', '1': 'Teen'}|  1

I was able to come up with following idea to get count of adult_victims;

from collections import defaultdict
d=defaultdict(int)
for k in temp2.iterrows():
    for j in k[1][1].keys():
        str1=k[1][1][j]
        str2=k[1][3][j]
        s=str1+'-'+str2
        d[s]+=1
        print(d)

This gives following output;

defaultdict(<class 'int'>, {'Victim-Adult': 1})
defaultdict(<class 'int'>, {'Victim-Adult': 2})
defaultdict(<class 'int'>, {'Victim-Adult': 3})
defaultdict(<class 'int'>, {'Victim-Adult': 3, 'Suspect-Teen': 1})
defaultdict(<class 'int'>, {'Victim-Adult': 4, 'Suspect-Teen': 1})
defaultdict(<class 'int'>, {'Victim-Adult': 5, 'Suspect-Teen': 1})
defaultdict(<class 'int'>, {'Victim-Adult': 5, 'Suspect-Teen': 1, 'Victim-Teen': 1})

But this is not exactly the code we want here. Actually, I am looking for a function which can be applied on the data frame and obtain desired output. Help is appreciated.

Upvotes: 0

Views: 24

Answers (2)

Joe Ferndz
Joe Ferndz

Reputation: 8508

You can also sum the values of the dictionary using this.

temp2['adults'] = temp2['participant_age_category'].apply(lambda x: sum(v == 'Adult' for v in x.values()))

Here .apply will iterate through each row. For each row, you are picking the values of the dict and checking if they are equal to Adult. If yes, a value of True (or 1) is sent. You then sum up all the 1s to get the count. If the check is False, a value of 0 is sent.

Note: This does not take into account participant_type_dic whether the individual is a victim or suspect. If you want to use both, then @Chris answer will be best.

Alternate, you can zip both and check like this:

temp2['adults'] = temp2.apply(lambda x: sum(v ==('Victim','Adult') for v in zip(x['participant_type_dic'].values(),x['participant_age_category'].values())), axis=1)

This will provide you a result of:

   id             participant_type_dic  ...      participant_age_category adults
0   0   {'0': 'Victim', '1': 'Victim'}  ...  {'0': 'Adult', '1': 'Adult'}      2
1   1  {'0': 'Victim', '1': 'Suspect'}  ...   {'0': 'Adult', '1': 'Teen'}      1
2   2                  {'0': 'Victim'}  ...                {'0': 'Adult'}      1
3   3   {'0': 'Victim', '1': 'Victim'}  ...   {'0': 'Adult', '1': 'Teen'}      1

Upvotes: 0

Chris
Chris

Reputation: 29742

One way using pandas.DataFrame.apply with custom function:

def adult_victim(ser):
    cnt = 0
    for k, v in ser["participant_age_category"].items():
        if (v, ser["participant_type_dic"][k]) == ("Adult", "Victim"):
            cnt += 1
    return cnt

temp2["adult_victim"] = temp2.apply(adult_victim, axis=1)
print(temp2)

Output:

  gun_stolen  id      participant_age_category  \
0        yes   0  {'1': 'Adult', '0': 'Adult'}   
1         no   1   {'1': 'Teen', '0': 'Adult'}   
2        yes   2                {'0': 'Adult'}   
3        yes   3   {'1': 'Teen', '0': 'Adult'}   

              participant_type_dic  adult_victim  
0   {'1': 'Victim', '0': 'Victim'}             2  
1  {'1': 'Suspect', '0': 'Victim'}             1  
2                  {'0': 'Victim'}             1  
3   {'1': 'Victim', '0': 'Victim'}             1  

Upvotes: 1

Related Questions