Reputation: 501
There is data:
task_id frame m_label
466 15 Other
466 376 Other
466 376 Snapshot
477 5053 Snapshot
477 5053 Transverse
task_id
, frame
and m_label
. I need to add a new column ("Snapshot plus"), which contains only data where in same task_id
the knowledge in the frame
column matches and where when there is a snapshot
entry in the m_label
column. For example line 4, in frame
376 the record in the new column will be "Snapshot | Other ", because the task_id
is the same, and the Frame
is the same.
Output required:
task_id frame m_label Snapshot plus Test
466 15 Other
466 376 Other
466 376 Snapshot Snapshot|Other
477 5053 Snapshot Snapshot|Transverse
477 5053 Transverse
488 677 Snapshot Only one
I wrote the code, here's an example, but it seems to me there are more elegant and clever solutions:
#drop duplicates to another frame
df_dup=df.loc[df.duplicated(subset=['frame'],keep='first'),:]
df_dr=df.drop_duplicates(subset='frame', keep='first')
# merge inner duplicates df and no duplicatets df
merge_df = df_dr.merge(df_dup,on=['task_id', 'frame'],how='inner')
# apply lambda by row for create new column with condition
merge_df['Snapshot plus']=merge_df.apply(lambda row: '|'+ row['m_label_x']+'|'+row['m_label_y']
if (row['m_label_x']=='Snapshot' or row['m_label_y']=='Snapshot' )
else '', axis='columns')
merge_df = merge_df[['task_id','frame','Snapshot plus']]
# merge with original df
df_df = df.merge(merge_df, on=['task_id','frame'], how='left')
# add new column to check if any snapshot label alone in frame and task
df_df['Test'] = (df_df.groupby(['task_id', 'frame'])
['m_label']
.transform(lambda x: len(x)<2 and
'Snapshot' in x.values
)
.map({True: 'Only one', False: ''})
)
# again drop duplicates
df_fin=df_df.drop_duplicates(subset=['frame','Snapshot plus'], keep='first')
Upvotes: 0
Views: 502
Reputation: 927
df["task_id//frame"]=df[["task_id","frame"]].apply(lambda x: str(x[0])+","+str(x[1]),axis=1)
Dict={}
for m,Id in zip(df["m_label"].values,df["task_id//frame"].values):
if m!="Snapshot":
Dict[Id]=m
Dict={'466,15': 'Other', '466,376': 'Other', '477,5053': 'Transverse'}
def Arrange_snaps(row):
if row[0]=="Snapshot":
print(row[1])
if row[1] in Dict.keys():
result=str(row[0])+"|"+str(Dict[row[1]])
else:
result=""
return result
else:
return ""
df["Snapshot plus"]=df[["m_label","task_id//frame"]].apply(Arrange_snaps,axis=1)
For a new dataframe:
Upvotes: 1