How can I convert multiple columns in a pandas dataframe into a column containing dictionaries of those columns?

Question

I have a very large dataframe containing the following columns:

RegAddress.CareOf,RegAddress.POBox,RegAddress.AddressLine1,RegAddress.AddressLine2,RegAddress.PostTown,RegAddress.County,RegAddress.Country,RegAddress.PostCode

I am inserting this dataframe (loaded from a CSV) into a relational database, and so would like to convert these columns into a single column, RegAddress, containing a dictionary, which contains the keys CareOf, POBox, AddressLine1... and so on. I cannot figure out how to do this in a vectorised fashion, i.e. go from:

RegAddress.CareOf,RegAddress.POBox
Me,2
You,3

to:

RegAddress
{"CareOf": "Me", "POBox": 2}
{"CareOf": "You", "POBox": 3}

efficiently.

frisko · Accepted Answer

You can use the .apply() method to achieve this:

selected_cols = ['RegAddress.CareOf', 'RegAddress.POBox']

df2 = pd.DataFrame()
df2['RegAddress'] = df.apply(
    lambda row: {
        col.split('.')[1]: row[col] for col in row.index
        if col in selected_cols
    },
    axis=1
)

Result:

                      RegAddress
0   {'CareOf': 'Me', 'POBox': 2}
1  {'CareOf': 'You', 'POBox': 3}

How can I convert multiple columns in a pandas dataframe into a column containing dictionaries of those columns?

Answers (1)

Related Questions