Reputation: 2470
I have a pandas dataframe like so :
id code mean count
1 A 32 22
1 B 9 56
1 C 25 78
2 A 33 35
2 B 11 66
Basically, for every ID there might be N number of entries and N varies for each ID, for some it might be 1, 2 for some it might be 3 or more. I want to concatenate all rows having the same ID. I know some columns will end up empty for some IDs since their 'N' will be lower as compared to the N of other IDs so I want to fill out -1 for those empty columns
Final dataframe will look like this:
id code1 mean1 count1 code2 mean2 count2 code3 mean3 count3
1 A 32 22 B 9 56 C 25 78
2 A 33 35 B 11 66 -1 -1 -1
Please ask for any additional info that might be required.
EDIT
Please take care that you are using vanilla pandas and NOT modin.pandas or any other version of pandas. I ran into problems while trying to execute the problem when using modin.pandas but vanilla pandas works just fine.
Upvotes: 1
Views: 691
Reputation: 1721
I think this will work for you. If you want to add numbers for the suffix just add a counter.
final=pd.DataFrame()
for i in df['code'].unique():
final=pd.concat([final,df.query(f'code=="{i}"').set_index('id').add_suffix(f"_{i}")],axis=1).fillna(-1)
code_A mean_A count_A code_B mean_B count_B code_C mean_C count_C
id
1 A 32 22 B 9 56 C 25.0 78.0
2 A 33 35 B 11 66 -1 -1.0 -1.0
Upvotes: 0
Reputation: 862641
Use GroupBy.cumcount
for counter, then reshape by DataFrame.set_index
and DataFrame.unstack
, sorting second level of MultiIndex
by DataFrame.sort_index
and last flatten MultiIndex by join
:
df = pd.DataFrame({'id': [1, 1, 1, 2, 2],
'code': ['A', 'B', 'C', 'A', 'B'],
'mean': [32, 9, 25, 33, 11],
'count': [22, 56, 78, 35, 66]})
print (df)
id code mean count
0 1 A 32 22
1 1 B 9 56
2 1 C 25 78
3 2 A 33 35
4 2 B 11 66
print (df.columns)
Index(['id', 'code', 'mean', 'count'], dtype='object')
print (df.columns.tolist())
['id', 'code', 'mean', 'count']
df['g'] = df.groupby('id').cumcount().add(1)
df = (df.set_index(['id','g'])
.unstack(fill_value=-1)
.sort_index(level=1, axis=1))
df.columns = df.columns.map(lambda x: f'{x[0]}{x[1]}')
For convert id
to column use reset_index
:
df = df.reset_index()
print (df)
id code1 count1 mean1 code2 count2 mean2 code3 count3 mean3
0 1 A 22 32 B 56 9 C 78 25
1 2 A 35 33 B 66 11 -1 -1 -1
df = df.reset_index()
Upvotes: 2