Reputation: 2952
My data looks as followed:
Application WorkflowStep
0 WF:ACAA-CR (auto) Manager
1 WF:ACAA-CR (auto) Access Responsible
2 WF:ACAA-CR (auto) Automatic
3 WF:ACAA-CR-AccResp (auto) Manager
4 WF:ACAA-CR-AccResp (auto) Access Responsible
5 WF:ACAA-CR-AccResp (auto) Automatic
6 WF:ACAA-CR-IT-AccResp[AUTO] Group
7 WF:ACAA-CR-IT-AccResp[AUTO] Access Responsible
8 WF:ACAA-CR-IT-AccResp[AUTO] Automatic
Additionally to these two columns I want to add a third column showing the sum of all WorkflowStep
's.
The dictionary should look like the following (or similiar):
{'WF:ACAA-CR (auto)':
[{'Workflow': ['Manager', 'Access Responsible','Automatic'], 'Summary': 3}],
'WF:ACAA-CR-AccResp (auto)':
[{'Workflow': ['Manager','Access Responsible','Automatic'], 'Summary': 3}],
'WF:ACAA-CR-IT-AccResp[AUTO]':
[{'Workflow': ['Group','Access Responsible','Automatic'], 'Summary': 3}]
}
My code to create a dictionary out of the two above columns works fine.
for i in range(len(df)):
currentid = df.iloc[i,0]
currentvalue = df.iloc[i,1]
dict.setdefault(currentid, [])
dict[currentid].append(currentvalue)
The code to create the sum of the WorkflowStep
is as followed and also works fine:
for key, values in dict.items():
val = values
match = ["Manager", "Access Responsible", "Automatic", "Group"]
c = Counter(val)
sumofvalues = 0
for m in match:
if c[m] == 1:
sumofvalues += 1
My initial idea was to adjust my first code where the initial key is the Application
and WorkflowStep
, Summary
would be sub-dictionaries.
for i in range(len(df)):
currentid = df.iloc[i,0]
currentvalue = df.iloc[i,1]
dict.setdefault(currentid, [])
dict[currentid].append({"Workflow": [currentvalue], "Summary": []})
The result of this is however unsatisfactory because it does not add currentvalue
to the already existing Workflow
key but recreates them after every iteration.
Example
{'WF:ACAA-CR (auto)': [{'Workflow': ['Manager'], 'Summary': []},
{'Workflow': ['Access Responsible'], 'Summary': []},
{'Workflow': ['Automatic'], 'Summary': []}]
}
How can I create a dictionary similiar to what I wrote above?
Upvotes: 2
Views: 76
Reputation: 2980
I think meW's answer is a much better way of doing things, and takes advantage of the neat power of dataframe's but for reference, if you wanted to do it the way you were trying, I think this will work:
# Create the data for testing.
d = {'Application': ["WF:ACAA-CR (auto)", "WF:ACAA-CR (auto)", "WF:ACAA-CR (auto)",
"WF:ACAA-CR-AccResp (auto)", "WF:ACAA-CR-AccResp (auto)", "WF:ACAA-CR-AccResp (auto)"],
'WorkflowStep': ["Manager", "Access Responsible","Automatic","Manager","Access Responsible", "Automatic"]}
df = pd.DataFrame(d)
new_dict = dict()
# Iterate through the rows of the data frame.
for index, row in df.iterrows():
# Get the values for the current row.
current_application_id = row['Application']
current_workflowstep = row['WorkflowStep']
# Set the default values if not already set.
new_dict.setdefault(current_application_id, {'Workflow': [], 'Summary' : 0})
# Add the new values.
new_dict[current_application_id]['Workflow'].append(current_workflowstep)
new_dict[current_application_id]['Summary'] += 1
print(new_dict)
Which gives an output of:
{'WF:ACAA-CR (auto)': {'Workflow': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3},
'WF:ACAA-CR-AccResp (auto)': {'Workflow': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}}
Upvotes: 0
Reputation: 3967
IIUC, here's what can help -
val = df.groupby('Application')['WorkflowStep'].unique()
{val.index[i]: [{'WorkflowStep':list(val[i]), 'Summary':len(val[i])}] for i in range(len(val))}
resulting into,
{'WF:ACAA-CR (auto)': [{'WorkflowStep': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}],
'WF:ACAA-CR-AccResp (auto)': [{'WorkflowStep': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}],
'WF:ACAA-CR-IT-AccResp[AUTO]': [{'WorkflowStep': ['Group', 'Access Responsible', 'Automatic'], 'Summary': 3}]}
Upvotes: 4