Reputation: 303
I've tried re-searching on stack over but couldn't get a solution for my problem
I want to concatenate the columns if they have the same column name:
Example:
input = { 'A' : [0,1,0,1,0], 'B' : [0,1,1,1,1], 'C':[1,1,1,1,0],
'D' : [1,1,0,0,0], 'E' : [1,0,1,0,1]}
df = pd.DataFrame(input)
df.columns = ['A','B','C','C','B']
A B C C B
0 0 0 1 1 1
1 1 1 1 1 0
2 0 1 1 0 1
3 1 1 1 0 0
4 0 1 0 0 1
Desired output:
A B C
0 0 0;1 1;1
1 1 1;0 1;1
2 0 1;1 1;0
3 1 1;0 1;0
4 0 1;1 0;0
Any pointers are highly appreciated.
Upvotes: 2
Views: 791
Reputation: 17911
You can transpose, groupby, join strings and transpose back:
df.T.astype('str').groupby(level=0).agg(';'.join).T
Output:
A B C
0 0 0;1 1;1
1 1 1;0 1;1
2 0 1;1 1;0
3 1 1;0 1;0
4 0 1;1 0;0
Upvotes: 1
Reputation: 397
you can try this code:
def function(x):
return x.apply(';'.join, 1)
DF = DF.astype(str).groupby(DF.columns, axis=1).agg(function)
Upvotes: 2
Reputation: 863801
You can grouping by columns names and for duplicates get DataFrame, so is used apply
with join
for join per rows:
DF = DF.astype(str).groupby(DF.columns, axis=1).agg(lambda x: x.apply(';'.join, 1))
Or:
DF = DF.astype(str).groupby(DF.columns, axis=1).agg(lambda x: [';'.join(y) for y in x.to_numpy()])
print (DF)
A B C
0 0 0;1 1;1
1 1 1;0 1;1
2 0 1;1 1;0
3 1 1;0 1;0
4 0 1;1 0;0
Upvotes: 5