Reputation: 357
I have data in excel and need to create a dictionary for those data.
expected output like below:-
d = [
{
"name":"dhdn",
"usn":1bm15mca13",
"sub":["c","java","python"],
"marks":[90,95,98]
},
{
"name":"subbu",
"usn":1bm15mca14",
"sub":["java","perl"],
"marks":[92,91]
},
{
"name":"paddu",
"usn":1bm15mca17",
"sub":["c#","java"],
"marks":[80,81]
}
]
Tried code but it is working for only two column
import pandas as pd
existing_excel_file = 'BHARTI_Model-4_Migration Service parameters - Input sheet_v1.0_DRAFT_26-02-2020.xls'
df_service = pd.read_excel(existing_excel_file, sheet_name='Sheet2')
df_service = df_service.fillna(method='ffill')
result = [{'name':k,'sub':g["sub"].tolist(),"marks":g["marks"].tolist()} for k,g in df_service.groupby(['name', 'usn'])]
print (result)
I am getting like below but I want as I expected like above.
[{'name': ('dhdn', '1bm15mca13'), 'sub': ['c', 'java', 'python'], 'marks': [90, 95, 98]}, {'name': ('paddu', '1bm15mca17'), 'sub': ['c#', 'java'], 'marks': [80, 81]}, {'name': ('subbu', '1bm15mca14'), 'sub': ['java', 'perl'], 'marks': [92, 91]}]
Upvotes: 2
Views: 81
Reputation: 577
All right! I solved your question although it took me a while.
The first part is the same as your progress.
import pandas as pd
df = pd.read_excel('test.xlsx')
df = df.fillna(method='ffill')
Then we need to get the unique names and how many rows they cover. I'm assuming there are as many unique names as there are unique "usn's". I created a list that stores these 'counts'.
unique_names = df.name.unique()
unique_usn = df.usn.unique()
counts = []
for i in range(len(unique_names)):
counts.append(df.name.str.count(unique_names[i]).sum())
counts
[3,2,2] #this means that 'dhdn' covers 3 rows, 'subbu' covers 2 rows, etc.
Now we need a smart function that will let us obtain the necessary info from the other columns.
def get_items(column_number):
empty_list = []
lower_bound = 0
for i in range(len(counts)):
empty_list.append(df.iloc[lower_bound:sum(counts[:i+1]),column_number].values.tolist())
lower_bound = sum(counts[:i+1])
return empty_list
I leave it to you to understand what is going on. But basically we are recovering the necessary info. We now just need to apply that to get a list for subs and for marks, respectively.
list_sub = get_items(3)
list_marks = get_items(2)
Finally, we put it all into one list of dicts.
d = []
for i in range(len(unique_names)):
diction = {}
diction['name'] = unique_names[i]
diction['usn'] = unique_usn[i]
diction['sub'] = list_sub[i]
diction['marks'] = list_marks[i]
d.append(diction)
And voilà!
print(d)
[{'name': 'dhdn', 'usn': '1bm15mca13', 'sub': [90, 95, 98], 'marks': ['c', 'java', 'python']},
{'name': 'subbu', 'usn': '1bm15mca14', 'sub': [92, 91], 'marks': ['java', 'perl']},
{'name': 'paddu', 'usn': '1bm15mca17', 'sub': [80, 81], 'marks': ['c#', 'java']}]
Upvotes: 1
Reputation: 357
Finally, I solved.
import pandas as pd
from pprint import pprint
existing_excel_file = 'BHARTI_Model-4_Migration Service parameters - Input sheet_v1.0_DRAFT_26-02-2020.xls'
df_service = pd.read_excel(existing_excel_file, sheet_name='Sheet2')
df_service = df_service.fillna(method='ffill')
result = [{'name':k[0],'usn':k[1],'sub':v["sub"].tolist(),"marks":v["marks"].tolist()} for k,v in df_service.groupby(['name', 'usn'])]
pprint (result)
It is giving expected output as I expected.
[{'marks': [90, 95, 98],
'name': 'dhdn',
'sub': ['c', 'java', 'python'],
'usn': '1bm15mca13'},
{'marks': [80, 81],
'name': 'paddu',
'sub': ['c#', 'java'],
'usn': '1bm15mca17'},
{'marks': [92, 91],
'name': 'subbu',
'sub': ['java', 'perl'],
'usn': '1bm15mca14'}]
Upvotes: 2