Groupby specific column then assign new values base on conditions

Question

I have a dataframe that looks like this below.

Name    M_Name       Tag        
John                  1
Dave      a           1
Mary                  1      
Sam                   1     
Chris     a           1
John                  2
Nola      f           2
Chuck                 2      
Rob                   2     
Chris     a           2
Angie                 3
Joe                   3
 :        :           : 
Tony                  n

I need to create a new column Tag_2, which is based on Tag column and M_Name column. First, I need to groupby Tag column then if M_Name column have more than 1 different letter, the whole Tag group will have a new value "Invalid" under Tag_2 column. If it only has 1 letter in the M-Name for the same Tag group then it will be "Valid". Here is what the output should look like.

Name    M_Name       Tag          Tag_2       
John                  1           Valid
Dave      a           1           Valid
Mary                  1           Valid           
Sam                   1           Valid     
Chris     a           1           Valid
John                  2          Invalid
Nola      f           2          Invalid          
Chuck                 2          Invalid      
Rob                   2          Invalid     
Chris     a           2          Invalid
Angie                 3           Valid
Joe                   3           Valid
 :        :           : 
Tony                  n           Valid

Tag 2 group is "Invalid" because the M_Name rows has more than 1 different letter (f & a) for the tag 2 group.

I am thinking to use groupby to solve this task but I do not know how to add a condition like this. df.groupby('Tag')['M_Name'] I am wondering am I on the right path using groupby or are ther any better approach that I should consider?

Thanks!

ddejohn · Accepted Answer

I'm sure there's a nicer way, but here's one solution:

def validate(s):
    return "Invalid" if len(set(s)) > 2 else "Valid"


mapping = df.groupby("Tag")["M_Name"].apply(validate)
df["Tag_2"] = df.loc[:, "Tag"].replace(mapping)

Output:

     Name M_Name  Tag    Tag_2
0    John           1    Valid
1    Dave      a    1    Valid
2    Mary           1    Valid
3     Sam           1    Valid
4   Chris      a    1    Valid
5    John           2  Invalid
6    Nola      f    2  Invalid
7   Chuck           2  Invalid
8     Rob           2  Invalid
9   Chris      a    2  Invalid
10  Angie           3    Valid
11    Joe           3    Valid

Groupby specific column then assign new values base on conditions

Answers (2)

Related Questions