Tanmay Bhatnagar
Tanmay Bhatnagar

Reputation: 2470

Concatenate rows with same column value in a single pandas dataframe

I have a pandas dataframe like so :

id code mean count  

1   A    32    22  
1   B    9     56
1   C    25    78
2   A    33    35
2   B    11    66

Basically, for every ID there might be N number of entries and N varies for each ID, for some it might be 1, 2 for some it might be 3 or more. I want to concatenate all rows having the same ID. I know some columns will end up empty for some IDs since their 'N' will be lower as compared to the N of other IDs so I want to fill out -1 for those empty columns

Final dataframe will look like this:

id code1 mean1 count1 code2 mean2 count2 code3 mean3 count3

1   A    32      22    B     9     56     C     25    78
2   A    33      35    B     11    66     -1    -1    -1

Please ask for any additional info that might be required.

EDIT
Please take care that you are using vanilla pandas and NOT modin.pandas or any other version of pandas. I ran into problems while trying to execute the problem when using modin.pandas but vanilla pandas works just fine.

Upvotes: 1

Views: 691

Answers (2)

Billy Bonaros
Billy Bonaros

Reputation: 1721

I think this will work for you. If you want to add numbers for the suffix just add a counter.

final=pd.DataFrame()
for i in df['code'].unique():
    final=pd.concat([final,df.query(f'code=="{i}"').set_index('id').add_suffix(f"_{i}")],axis=1).fillna(-1)


   code_A  mean_A  count_A code_B  mean_B  count_B code_C  mean_C  count_C
id                                                                        
1       A      32       22      B       9       56      C    25.0     78.0
2       A      33       35      B      11       66     -1    -1.0     -1.0

Upvotes: 0

jezrael
jezrael

Reputation: 862641

Use GroupBy.cumcount for counter, then reshape by DataFrame.set_index and DataFrame.unstack, sorting second level of MultiIndex by DataFrame.sort_index and last flatten MultiIndex by join:

df = pd.DataFrame({'id': [1, 1, 1, 2, 2], 
                   'code': ['A', 'B', 'C', 'A', 'B'],
                   'mean': [32, 9, 25, 33, 11], 
                   'count': [22, 56, 78, 35, 66]})

print (df)
   id code  mean  count
0   1    A    32     22
1   1    B     9     56
2   1    C    25     78
3   2    A    33     35
4   2    B    11     66

print (df.columns)
Index(['id', 'code', 'mean', 'count'], dtype='object')


print (df.columns.tolist())
['id', 'code', 'mean', 'count']

df['g'] = df.groupby('id').cumcount().add(1)
df = (df.set_index(['id','g'])
        .unstack(fill_value=-1)
        .sort_index(level=1, axis=1))

df.columns = df.columns.map(lambda x: f'{x[0]}{x[1]}')

For convert id to column use reset_index:

df = df.reset_index()
print (df)
   id code1  count1  mean1 code2  count2  mean2 code3  count3  mean3
0   1     A      22     32     B      56      9     C      78     25
1   2     A      35     33     B      66     11    -1      -1     -1
df = df.reset_index()

Upvotes: 2

Related Questions