Reputation: 1710
Consider having following DataFrame that I got from MySQL table of size of 11k rows:
col1 | col2 | col3 | col4
-----------------------------
cat | black | small | lovely
-----------------------------
dog | white | medium| brave
-----------------------------
mice | grey | tinny | fast
...
I want to convert it dynamically to the following:
col1 | newcol
------------------------------------------------------------
cat | {"col2": "black", "col3": "small", "col4": "lovely"}
------------------------------------------------------------
dog | {"col2": "white", "col3": "medium", "col4": "brave"}
------------------------------------------------------------
mice | {"col2": "grey", "col3": "tinny", "col4": "fast"}
...
Upvotes: 4
Views: 3790
Reputation: 242
In addition to Anky's answer I found this article that describes more manipulations: https://codeflex.co/mysql-table-migration-with-pandas-dataframe/
Upvotes: 1
Reputation: 75080
You can do agg
as dict
on axis=1
For dictionary:
out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(dict,1))
For json:
out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(pd.Series.to_json,1))
print(out)
col1 new_col
0 cat {'col2': 'black', 'col3': 'small', 'col4': 'lo...
1 dog {'col2': 'white', 'col3': 'medium', 'col4': 'b...
2 mice {'col2': 'grey', 'col3': 'tinny', 'col4': 'fast'}
Upvotes: 10
Reputation: 5601
use df.to_json(orient='records')
to dump json record list, then loads json to list of dict, assign to new column.
import pandas as pd
df = pd.DataFrame({'col1': ['cat', 'dog', 'mice'],
'col2' : ['black', 'white', 'grey'],
'col3' : ['small', 'medium', 'tinny']})
# create json column
# data_json = df.iloc[:, 1:].to_json(orient='records')
# data = json.loads(data_json)
data = df.iloc[:, 1:].to_dict(orient='records')
# keep first column
dfn = df.iloc[:, [0]].copy()
dfn['newcol'] = data
# dfn['newcol'] = pd.Series(data).map(json.dumps)
dfn
col1 newcol
0 cat {"col2": "black", "col3": "small"}
1 dog {"col2": "white", "col3": "medium"}
2 mice {"col2": "grey", "col3": "tinny"}
data_json
(type str
)
[{"col2":"black","col3":"small"},{"col2":"white","col3":"medium"},{"col2":"grey","col3":"tinny"}]
Upvotes: 2
Reputation: 1156
For the give requirement i would suggest using itertuples for generating a list of dicts and assigning it to the dataframe as follows
import pandas as pd
data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny'], 'col4': ['lovely','brave','fast']}
df = pd.DataFrame(data)
def getDictColumn_df1(df, new_col_name="newcol", cols_from_start=1):
df[new_col_name] = tuple(map(lambda row: row._asdict(), df.iloc[:,cols_from_start:].itertuples()))
return df[['col1', new_col_name]]
getDictColumn_df1(df)
to know more about itertuples() check this out
For a little different case when you want to keep all indexes and convert all values to a single dict, you can do
def getDictColumn_df2(df, new_col_name="newcol"):
df[new_col_name] = tuple(map(lambda row: row._asdict(), df.itertuples(index=False)))
return df[[new_col_name]]
getDictColumn_df2(df)
Upvotes: 0
Reputation: 1411
As you might expect, there are many ways to do this, but this is what came to mind:
>>> import pandas as pd
>>> d = {"col1": ["cat", 'dog', 'mice'], "col2": ["black", "white", "grey"], "col3": ["small", 'medium', 'tinny'], 'col4': ['lovely', 'brave','fast']}
>>> df = pd.DataFrame(d)
>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df[['col2','col3','col4']].to_dict(orient='records')})], axis=1)
For scenarios where you may not know the column names you want from the DataFrame, you can use the following to select the column indices. In this case, from column 1, until the end.
>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df.iloc[:, 1:].to_dict(orient='records')})], axis=1)
Upvotes: 2
Reputation: 93
I am using three columns in the example below.
data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny']}
import pandas as pd
df = pd.DataFrame(data)
col = list(df.columns)
We can use a lambda function as follows
df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)
You can add it to the dataframe as follows
df['new_col'] = df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)
This produces the following output.
df
col1 col2 col3 new_col
0 cat black small {'col2': 'black', 'col3': 'small'}
1 dog white medium {'col2': 'white', 'col3': 'medium'}
2 mice grey tinny {'col2': 'grey', 'col3': 'tinny'}
And then drop the unwanted columns using df.drop
This should produce the required output.
df.drop(['col2', 'col3'], axis = 1)
col1 new_col
0 cat {'col2': 'black', 'col3': 'small'}
1 dog {'col2': 'white', 'col3': 'medium'}
2 mice {'col2': 'grey', 'col3': 'tinny'}
Upvotes: 0