Reputation: 245

Insert list json objects into row based on other column values in dataframe

I have dataframe with the following columns:

ID A1 B1 C1 A2 B2 C2 A3 B3 C3
AA  1  3  6           4  0  6 
BB  5  5  4  6  7  9 
CC  5  5  5

I want to create a new column called Z that takes each row, groups them into a JSON list of records, and renames the column as their key. After the JSON column is constructed, I want to drop all the columns and keep Z and ID only.

Here is the output desired:

ID Z
AA [{"A":1, "B":3,"C":6},{"A":4, "B":0,"C":6}]
BB [{"A":5, "B":5,"C":4},{"A":6, "B":7,"C":9}]
CC [{"A":5, "B":5,"C":5}]

Here is my current attempt:

df2 = df.groupby(['ID']).apply(lambda x: x[['A1', 'B1', 'C1',
                                            'A2', 'B2', 'C2', 'A3', 'B3', 'C3']].to_dict('records')).to_frame('Z').reset_index()

The problem is that I cannot rename the columns so that only the letter remains and the number is removed like the example above. Running the code above also does not separate each group of 3 into one object as opposed to creating two objects in my list. I would like to accomplish this in Pandas if possible. Any guidance is greatly appreciated.

Upvotes: 4

Answers (4)

Noman

Reputation: 163

I just add a few lines on subham codes and it worked for me

import pandas as pd 
from numpy import nan
data = pd.DataFrame({'ID': {0: 'AA', 1: 'BB', 2: 'CC'}, 'A1': {0: 1, 1: 5, 2: 5}, 'B1': {0: 3, 1: 5, 2: 5}, 'C1': {0: 6, 1: 4, 2: 5}, 'A2': {0: nan, 1: 6.0, 2: nan}, 'B2': {0: nan, 1: 7.0, 2: nan}, 'C2': {0: nan, 1: 9.0, 2: nan}, 'A3': {0: 4.0, 1: nan, 2: nan}, 'B3': {0: 0.0, 1: nan, 2: nan}, 'C3': {0: 6.0, 1: nan, 2: nan}} )
data

data.index = data.ID
data.drop(columns=['ID'],inplace=True)
data
data.columns = data.columns.str.split(r'(?=\d+$)', expand=True)

d = data.stack().groupby(level=0).apply(pd.DataFrame.to_dict, 'records').reset_index(name='Z')
d.index = d.ID
d.drop(columns=['ID'],inplace=True)
d.to_dict()['Z']

Now we can see we get desired output thanks, @shubham Sharma, for the answer I think this might help

Upvotes: 0

Shubham Sharma

Reputation: 71687

Pandas solution

Convert the columns to MultiIndex by splitting and expanding around a regex delimiter, then stack the dataframe to convert the dataframe to multiindex series, then group the dataframe on level=0 and apply the to_dict function to create records per ID

s = df.set_index('ID')
s.columns = s.columns.str.split(r'(?=\d+$)', expand=True)
s.stack().groupby(level=0).apply(pd.DataFrame.to_dict, 'records').reset_index(name='Z')

Result

   ID                                                                 Z
0  AA  [{'A': 1.0, 'B': 3.0, 'C': 6.0}, {'A': 4.0, 'B': 0.0, 'C': 6.0}]
1  BB  [{'A': 5.0, 'B': 5.0, 'C': 4.0}, {'A': 6.0, 'B': 7.0, 'C': 9.0}]
2  CC                                  [{'A': 5.0, 'B': 5.0, 'C': 5.0}]

Upvotes: 1

Nick Lien

Reputation: 1

# create a columns name map for chang related column
columns = dataset.columns
columns_map = {}
for i in columns:
    columns_map[i] = f"new {i}"

def change_row_to_json(row):
    new_dict = {}
    for index, value in enumerate(row):
        new_dict[columns_map[columns[index]]] = value
    return json.dumps(new_dict, indent = 4)

dataset.loc[:,'Z'] = dataset.apply(change_row_to_json, axis=1)
dataset= dataset[["ID", "Z"]]

Upvotes: 0

NexAngelus

Reputation: 1

Have you tried to go line by line?? I am not very good with pandas and python. But I have me this code. Hope it works for you.

toAdd = []
for row in dataset.values:
    toAddLine = {}
    i = 0
    for data in row:
        
        if data is not None:
            toAddLine["New Column Name "+dataset.columns[i]] = data
        i = i +1 
            
    toAdd.append(toAddLine)
dataset['Z'] = toAdd
dataset['Z']

Upvotes: 0

Insert list json objects into row based on other column values in dataframe

Answers (4)

Pandas solution

Result

Related Questions