How to merge specific cells from a csv row that have the same value

Question

I have a set of data with 5 cells and some rows containing the same value in row[0], row[1] and row[3].

for example:

1   2   3   4   5
A   AB  AD  AC  AE
R   c   W   E    F
A   AB  B   AC  D
...

I want to merge the rows that has the same value in row[0] by adding the unique values together. So the output after processing should be something like:

1   2   3   4   5
A  AB  ADB  AC  AED
R   c   W   E    F
...

I am trying to do this in python only, by setting a dictionary with row[0] as the key, but I could not figure out a way to loop over the other rows and append them. I think the code should look something like:

result = {}

with open('mydata.csv', 'r', encoding='utf-8') as f:
    reader = csv.reader(f)
    l = [ l for l in reader]
for row in l:
    idx = row[0]
    values = row[1:]
    if idx in result:
        result[idx] =  [values[0], values[1], [loop and append], values[3], [loop and append]]
        print(result[idx])
    else:
        result[idx] = values

Chris · Accepted Answer

Use pandas.DataFrame.groupby with agg and pd.Series.unique:

new_df = df.groupby('1').agg(lambda x: ''.join(x.unique())).reset_index()
print(new_df)

Output:

   1   2    3   4    5
0  A  AB  ADB  AC  AED
1  R   c    W   E    F

How to merge specific cells from a csv row that have the same value

Answers (1)

Related Questions