adding new column by condition with duplicated rows in dataframe

Question

There is data:

task_id frame   m_label 
466     15    Other 
466     376   Other 
466     376   Snapshot  
477     5053  Snapshot      
477     5053  Transverse

task_id, frame and m_label. I need to add a new column ("Snapshot plus"), which contains only data where in same task_id the knowledge in the frame column matches and where when there is a snapshot entry in the m_label column. For example line 4, in frame 376 the record in the new column will be "Snapshot | Other ", because the task_id is the same, and the Frame is the same. Output required:

task_id frame   m_label  Snapshot plus       Test
466     15    Other 
466     376   Other 
466     376   Snapshot  Snapshot|Other
477     5053  Snapshot  Snapshot|Transverse 
477     5053  Transverse
488     677   Snapshot                       Only one

I wrote the code, here's an example, but it seems to me there are more elegant and clever solutions:

#drop duplicates to another frame
df_dup=df.loc[df.duplicated(subset=['frame'],keep='first'),:]    
df_dr=df.drop_duplicates(subset='frame', keep='first')
# merge inner duplicates df and no duplicatets df
merge_df = df_dr.merge(df_dup,on=['task_id', 'frame'],how='inner')
# apply lambda by row for create new column with condition
merge_df['Snapshot plus']=merge_df.apply(lambda row: '|'+ row['m_label_x']+'|'+row['m_label_y']
                                            if (row['m_label_x']=='Snapshot' or row['m_label_y']=='Snapshot' )
                                            else '', axis='columns')  
merge_df = merge_df[['task_id','frame','Snapshot plus']]      
# merge with original df
df_df = df.merge(merge_df, on=['task_id','frame'], how='left')  
# add new column to check if any snapshot label alone in frame and task            
df_df['Test'] = (df_df.groupby(['task_id', 'frame'])
                ['m_label']
                .transform(lambda x: len(x)<2 and
                                     'Snapshot' in x.values
                           )
                .map({True: 'Only one', False: ''})
              )
# again drop duplicates
df_fin=df_df.drop_duplicates(subset=['frame','Snapshot plus'], keep='first')

Hakan Akg&#252;n · Accepted Answer

df["task_id//frame"]=df[["task_id","frame"]].apply(lambda x: str(x[0])+","+str(x[1]),axis=1)

Dict={}
for m,Id in zip(df["m_label"].values,df["task_id//frame"].values):
    if m!="Snapshot":
        Dict[Id]=m

Dict={'466,15': 'Other', '466,376': 'Other', '477,5053': 'Transverse'}

def Arrange_snaps(row):
    if row[0]=="Snapshot":
        print(row[1])
        if row[1] in Dict.keys():
            result=str(row[0])+"|"+str(Dict[row[1]])
        else:
            result=""
        return result
    else:
        return ""
df["Snapshot plus"]=df[["m_label","task_id//frame"]].apply(Arrange_snaps,axis=1)

For a new dataframe:

adding new column by condition with duplicated rows in dataframe

Answers (1)

Related Questions