jm2583
jm2583

Reputation: 15

Merging Contents of Column in Dataframe for each Row that share a common value

I have data in a table that is assigned to various individuals. However, a lot of the data is identical based on MD5 and I want each entry to be attributed to each person with the same MD5. For example, the input data might looks like the following:

Input

However, I want the output to be as per the table below. By way of example, Owner for A,C,E is the same as their MD5 is identical (appreciate MD5 would be longer, this is just for illustration). Similarly for B,D,F, the MD5 is the same, so the Owner is also merged for all 3 entries.

Output data

I have written a script that works (but is hugely inefficient):

md5OwnerDF = pd.DataFrame({"MD5 Digest":[],"All_Owners":[]})
for md5 in uniqueTLMD5List:
        # sub dataframe with md5
        md5DF = df.loc[df['MD5 Digest'] == md5].copy()
        # bring in owners of this sub dataframe (md5DF) and convert into unique list (ownerList)
        ownerList = ';'.join([str(elem) for elem in md5DF.loc[:,'Owner'].unique().tolist()])
        md5OwnerDF_temp = pd.DataFrame({"MD5 Digest":[md5],"All_Owners":[ownerList]})
        md5OwnerDF = md5OwnerDF.append(md5CustodianDF_temp)

Is there a better way of achieving the same result without iteration?

Thank you in advance.

Upvotes: 0

Views: 51

Answers (1)

BENY
BENY

Reputation: 323266

You can try with transform + groupby

df['new'] = df.roupby('ID')['MD5'].transform(','.join)

Upvotes: 2

Related Questions