Reputation: 1117
I have dynamic number of columns in my dataframe for each row and a single record can go for more than 1 row. First 2 columns are the key columns. If the key columns are matching, i have to append each row of data into a single row and create as much columns as required for appending.
Input is below (dataframe) c1 in a column c2 in a column etc...
row 1: A 1 c1 c2 c3.. c20
row 2: A 1 c21....c25
row 3. A 1 c26.... c35
row 4: A 2 d1 d2... d21
row 5: A 2 d22....d27
I tried using df.groupby(___first 2 column names____).first().reset_index() which returns only first row as we are using first(). is there any function to do this in python
output required: (dataframe)
row 1: A 1 c1 c2...c35 (each value in 1 column)
row 2: A 2 d1...d27 (each value in 1 column)
Upvotes: 1
Views: 3490
Reputation: 862511
Use GroupBy.cumcount
for series of counter, then DataFrame.set_index
, DataFrame.sort_index
and last flatten MultiIndex
in list comprehension:
print (df)
a b c d e f
row1: A 1 c1 c2 c3 c20
row2: A 1 c21 c22 c23 c24
row3. A 1 c26 c27 c28 c29
row4: A 2 d1 d2 d21 d22
row5: A 2 d22 d27 d28 d29
s = df.groupby(['a','b']).cumcount()
df1 = df.set_index(['a', 'b', s]).unstack().sort_index(level=1, axis=1)
df1.columns = [f'{x}{y}' for x, y in df1.columns]
df1 = df1.reset_index()
print (df1)
a b c0 d0 e0 f0 c1 d1 e1 f1 c2 d2 e2 f2
0 A 1 c1 c2 c3 c20 c21 c22 c23 c24 c26 c27 c28 c29
1 A 2 d1 d2 d21 d22 d22 d27 d28 d29 NaN NaN NaN NaN
Upvotes: 3