Reputation: 49
I have a table with duplicate columns that I want to join into one singular column. They share the same column names, but I want to merge any column with the same title to become one.
I have tried to to use merge, concat, among other things, but no luck.
data = [['a','a','c'],['a','b','d'],['a','c','c']]
df = pd.DataFrame(data,columns=['col1','col2','col1'])
df
co1 col2 col1
a a c
a b d
a c c
I expect to have two columns from this point "col1 with a,a,a,c,d,c " and "col2 a,b,c,nan,nan,nan"
Upvotes: 3
Views: 47
Reputation: 402553
First stack
, then unstack
. We will need to do a little bit more before we can unstack the data.
u = df.stack()
(u.to_frame()
.set_index(u.groupby(u.index).cumcount(), append=True)
.unstack(1)
.sort_index(level=1)[0]
.reset_index(drop=True))
col1 col2
0 a a
1 a b
2 a c
3 c NaN
4 d NaN
5 c NaN
Another option is groupby
, to_dict
, and reconstruction.
dct = (df.groupby(df.columns, axis=1)
# x.values.ravel().tolist()
.apply(lambda x: [z for y in x.values for z in y])
.to_dict())
pd.DataFrame.from_dict(dct, orient='index').T
col1 col2
0 a a
1 c b
2 a c
3 d None
4 a None
5 c None
Upvotes: 2
Reputation: 323286
melt
groupby
with concat
d={x : y['value'].reset_index(drop=True) for x,y in df.melt().groupby('variable')}
df=pd.concat(d,1)
df
Out[39]:
col1 col2
0 a a
1 a b
2 a c
3 c NaN
4 d NaN
5 c NaN
Upvotes: 1