Abhinav Dhiman
Abhinav Dhiman

Reputation: 755

Moving data from rows to columns based on another column

I have a huge dataset with contents such as given below:

+------+------------------------------------------------------------------+----------------------------------+--+
| HHID |                             VAL_CD64                             |             VAL_CD32             |  |
+------+------------------------------------------------------------------+----------------------------------+--+
|  203 | 8c5bfd9b6755ffcdb85dc52a701120e0876640b69b2df0a314dc9e7c2f8f58a5 | 373aeda34c0b4ab91a02ecf55af58e15 |  |
|  203 | 0511dc19cb09f8f4ba3d140754dafb1471dacdbb6747cdb5a2bc38e278d229c8 | 6f3606577eadacef1b956307558a1efd |  |
|  203 | a18adc1bcae1b570a610b13565b82e5647f05fef8a4680bd6ccdd717cdd34af7 | 332321ab150879e930869c15b1d10c83 |  |
|  720 | f6c581becbac4ec1291dc4b9ce566334b1cb2c85e234e489e7fd5e1393bd8751 | 2c4f97a04f02db5a36a85f48dab39b5b |  |
|  720 | abad845107a699f5f99575f8ed43e0440d87a8fc7229c1a1db67793561f0f1c3 | 2111293e946703652070968b224875c9 |  |
|  348 | 25c7cf022e6651394fa5876814a05b8e593d8c7f29846117b8718c3dd951e496 | 5c80a555fcda02d028fc60afa29c4a40 |  |
|  348 | 67d9c0a4bb98900809bcfab1f50bef72b30886a7b48ff0e9eccf951ef06542f9 | 6c10cd11b805fa57d2ca36df91654576 |  |
|  348 | 05f1e412e7765c4b54a9acfd70741af545564f6fdfe48b073bfd3114640f5e37 | 6040b29107adf1a41c4f5964e0ff6dcb |  |
|  403 | 3e8da3d63c51434bcd368d6829c7cee490170afc32b5137be8e93e7d02315636 | 71a91c4768bd314f3c9dc74e9c7937e8 |  |
+------+------------------------------------------------------------------+----------------------------------+--+

HHID can be present in the file at a maximum of three times. If HHID is found once, then the VAL_CD64/VAL_CD32 should be moved to VAL1_CD64/VAL1_CD32 columns, if found 2nd time, second value should be moved to VAL2_CD64/VAL2_CD32 columns, and if found 3rd time, third value should be moved to VAL3_CD64/VAL3_CD32 columns. If value is not found, then these columns should be left blank.

Output should look something like this:

+------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------+----------------------------------+----------------------------------+--+
| HHID |                            VAL1_CD64                             |                            VAL2_CD64                             |                            VAL3_CD64                             |            VAL1_CD32             |            VAL2_CD32             |            VAL3_CD32             |  |
+------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------+----------------------------------+----------------------------------+--+
|  203 | 8c5bfd9b6755ffcdb85dc52a701120e0876640b69b2df0a314dc9e7c2f8f58a5 | 0511dc19cb09f8f4ba3d140754dafb1471dacdbb6747cdb5a2bc38e278d229c8 | a18adc1bcae1b570a610b13565b82e5647f05fef8a4680bd6ccdd717cdd34af7 | 373aeda34c0b4ab91a02ecf55af58e15 | 6f3606577eadacef1b956307558a1efd | 332321ab150879e930869c15b1d10c83 |  |
|  720 | f6c581becbac4ec1291dc4b9ce566334b1cb2c85e234e489e7fd5e1393bd8751 | abad845107a699f5f99575f8ed43e0440d87a8fc7229c1a1db67793561f0f1c3 |                                                                  | 2c4f97a04f02db5a36a85f48dab39b5b | 2111293e946703652070968b224875c9 |                                  |  |
|  348 | 25c7cf022e6651394fa5876814a05b8e593d8c7f29846117b8718c3dd951e496 | 67d9c0a4bb98900809bcfab1f50bef72b30886a7b48ff0e9eccf951ef06542f9 | 05f1e412e7765c4b54a9acfd70741af545564f6fdfe48b073bfd3114640f5e37 | 5c80a555fcda02d028fc60afa29c4a40 | 6c10cd11b805fa57d2ca36df91654576 | 6040b29107adf1a41c4f5964e0ff6dcb |  |
|  403 | 3e8da3d63c51434bcd368d6829c7cee490170afc32b5137be8e93e7d02315636 |                                                                  |                                                                  | 71a91c4768bd314f3c9dc74e9c7937e8 |                                  |                                  |  |
+------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+----------------------------------+----------------------------------+----------------------------------+--+

I tried using pivot/melt in pandas but unable to get an idea to implement it. Can anyone help in giving me a lead?

Thanks

Upvotes: 0

Views: 55

Answers (1)

Ynjxsjmh
Ynjxsjmh

Reputation: 30002

One possible way is to combine VAL_CD32 and VAL_CD64 into list then split those list into columns:

df_ = df.groupby('HHID').agg({'VAL_CD32': list, 'VAL_CD64': list})

data = []
for col in df_.columns:
    d = pd.DataFrame(df_[col].values.tolist(), index=df_.index)
    d.columns = [f'{col}_{i}' for i in map(str, range(1, len(d.columns)+1))]
    data.append(d)

res = pd.concat(data, axis=1)
                            VAL_CD32_1                        VAL_CD32_2  \
HHID
203   373aeda34c0b4ab91a02ecf55af58e15  6f3606577eadacef1b956307558a1efd
348   5c80a555fcda02d028fc60afa29c4a40  6c10cd11b805fa57d2ca36df91654576
403   71a91c4768bd314f3c9dc74e9c7937e8                              None
720   2c4f97a04f02db5a36a85f48dab39b5b  2111293e946703652070968b224875c9

                            VAL_CD32_3  \
HHID
203   332321ab150879e930869c15b1d10c83
348   6040b29107adf1a41c4f5964e0ff6dcb
403                               None
720                               None

                                                            VAL_CD64_1  \
HHID
203   8c5bfd9b6755ffcdb85dc52a701120e0876640b69b2df0a314dc9e7c2f8f58a5
348   25c7cf022e6651394fa5876814a05b8e593d8c7f29846117b8718c3dd951e496
403   3e8da3d63c51434bcd368d6829c7cee490170afc32b5137be8e93e7d02315636
720   f6c581becbac4ec1291dc4b9ce566334b1cb2c85e234e489e7fd5e1393bd8751

                                                            VAL_CD64_2  \
HHID
203   0511dc19cb09f8f4ba3d140754dafb1471dacdbb6747cdb5a2bc38e278d229c8
348   67d9c0a4bb98900809bcfab1f50bef72b30886a7b48ff0e9eccf951ef06542f9
403                                                               None
720   abad845107a699f5f99575f8ed43e0440d87a8fc7229c1a1db67793561f0f1c3

                                                            VAL_CD64_3
HHID
203   a18adc1bcae1b570a610b13565b82e5647f05fef8a4680bd6ccdd717cdd34af7
348   05f1e412e7765c4b54a9acfd70741af545564f6fdfe48b073bfd3114640f5e37
403                                                               None
720                                                               None

Upvotes: 3

Related Questions