ybonda
ybonda

Reputation: 1710

Pandas DataFrame - Convert columns into JSON and add as a new column

Consider having following DataFrame that I got from MySQL table of size of 11k rows:

col1 |  col2 | col3  | col4
-----------------------------
 cat | black | small | lovely
-----------------------------
 dog | white | medium| brave 
-----------------------------
mice | grey  | tinny | fast

...

I want to convert it dynamically to the following:

col1 |     newcol
------------------------------------------------------------
 cat | {"col2": "black", "col3": "small", "col4": "lovely"}
------------------------------------------------------------
 dog | {"col2": "white", "col3": "medium", "col4": "brave"}
------------------------------------------------------------
mice | {"col2": "grey", "col3": "tinny", "col4": "fast"}

...

Upvotes: 4

Views: 3790

Answers (6)

JavaGoPro
JavaGoPro

Reputation: 242

In addition to Anky's answer I found this article that describes more manipulations: https://codeflex.co/mysql-table-migration-with-pandas-dataframe/

Upvotes: 1

anky
anky

Reputation: 75080

You can do agg as dict on axis=1

For dictionary:

out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(dict,1))

For json:

out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(pd.Series.to_json,1))

print(out)

   col1                                            new_col
0   cat  {'col2': 'black', 'col3': 'small', 'col4': 'lo...
1   dog  {'col2': 'white', 'col3': 'medium', 'col4': 'b...
2  mice  {'col2': 'grey', 'col3': 'tinny', 'col4': 'fast'}

Upvotes: 10

Ferris
Ferris

Reputation: 5601

use df.to_json(orient='records') to dump json record list, then loads json to list of dict, assign to new column.

import pandas as pd
df = pd.DataFrame({'col1': ['cat', 'dog', 'mice'], 
        'col2' : ['black', 'white', 'grey'], 
        'col3' : ['small', 'medium', 'tinny']})

# create json column
# data_json = df.iloc[:, 1:].to_json(orient='records')
# data = json.loads(data_json)
data = df.iloc[:, 1:].to_dict(orient='records')

# keep first column
dfn = df.iloc[:, [0]].copy()
dfn['newcol'] = data
# dfn['newcol'] = pd.Series(data).map(json.dumps)

dfn

   col1                               newcol
0   cat   {"col2": "black", "col3": "small"}
1   dog  {"col2": "white", "col3": "medium"}
2  mice    {"col2": "grey", "col3": "tinny"}

data_json(type str)

[{"col2":"black","col3":"small"},{"col2":"white","col3":"medium"},{"col2":"grey","col3":"tinny"}]

Upvotes: 2

Hemant
Hemant

Reputation: 1156

For the give requirement i would suggest using itertuples for generating a list of dicts and assigning it to the dataframe as follows

import pandas as pd 
data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny'], 'col4': ['lovely','brave','fast']} 
df = pd.DataFrame(data) 

def getDictColumn_df1(df, new_col_name="newcol", cols_from_start=1):
    df[new_col_name] = tuple(map(lambda row: row._asdict(), df.iloc[:,cols_from_start:].itertuples()))
    return df[['col1', new_col_name]]

getDictColumn_df1(df)

to know more about itertuples() check this out

For a little different case when you want to keep all indexes and convert all values to a single dict, you can do

def getDictColumn_df2(df, new_col_name="newcol"):
    df[new_col_name] = tuple(map(lambda row: row._asdict(), df.itertuples(index=False)))
    return df[[new_col_name]]

getDictColumn_df2(df)

Upvotes: 0

Benjamin Rowell
Benjamin Rowell

Reputation: 1411

As you might expect, there are many ways to do this, but this is what came to mind:

>>> import pandas as pd
>>> d = {"col1": ["cat", 'dog', 'mice'], "col2": ["black", "white", "grey"], "col3": ["small", 'medium', 'tinny'], 'col4': ['lovely', 'brave','fast']}
>>> df = pd.DataFrame(d)
>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df[['col2','col3','col4']].to_dict(orient='records')})], axis=1)

For scenarios where you may not know the column names you want from the DataFrame, you can use the following to select the column indices. In this case, from column 1, until the end.

>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df.iloc[:, 1:].to_dict(orient='records')})], axis=1)

Upvotes: 2

basicknowledge
basicknowledge

Reputation: 93

I am using three columns in the example below.

data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny']}
import pandas as pd
df = pd.DataFrame(data)
col = list(df.columns)

We can use a lambda function as follows

df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)

You can add it to the dataframe as follows

df['new_col'] = df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)

This produces the following output.

df
   col1   col2    col3                              new_col
0   cat  black   small   {'col2': 'black', 'col3': 'small'}
1   dog  white  medium  {'col2': 'white', 'col3': 'medium'}
2  mice   grey   tinny    {'col2': 'grey', 'col3': 'tinny'}

And then drop the unwanted columns using df.drop

This should produce the required output.


df.drop(['col2', 'col3'], axis = 1)
   col1                              new_col
0   cat   {'col2': 'black', 'col3': 'small'}
1   dog  {'col2': 'white', 'col3': 'medium'}
2  mice    {'col2': 'grey', 'col3': 'tinny'}

Upvotes: 0

Related Questions