Alex_P
Alex_P

Reputation: 2952

Iteration issue to create a nested dictionary

My data looks as followed:

                 Application                       WorkflowStep
0                WF:ACAA-CR (auto)                      Manager
1                WF:ACAA-CR (auto)           Access Responsible
2                WF:ACAA-CR (auto)                    Automatic
3                WF:ACAA-CR-AccResp (auto)              Manager
4                WF:ACAA-CR-AccResp (auto)   Access Responsible
5                WF:ACAA-CR-AccResp (auto)            Automatic
6                WF:ACAA-CR-IT-AccResp[AUTO]              Group
7                WF:ACAA-CR-IT-AccResp[AUTO] Access Responsible
8                WF:ACAA-CR-IT-AccResp[AUTO]          Automatic

Additionally to these two columns I want to add a third column showing the sum of all WorkflowStep's. The dictionary should look like the following (or similiar):

{'WF:ACAA-CR (auto)': 
             [{'Workflow': ['Manager', 'Access Responsible','Automatic'], 'Summary': 3}], 
 'WF:ACAA-CR-AccResp (auto)': 
             [{'Workflow': ['Manager','Access Responsible','Automatic'], 'Summary': 3}], 
 'WF:ACAA-CR-IT-AccResp[AUTO]': 
             [{'Workflow': ['Group','Access Responsible','Automatic'], 'Summary': 3}]
}

My code to create a dictionary out of the two above columns works fine.

for i in range(len(df)):
    currentid = df.iloc[i,0]
    currentvalue = df.iloc[i,1]
    dict.setdefault(currentid, [])
    dict[currentid].append(currentvalue)

The code to create the sum of the WorkflowStep is as followed and also works fine:

for key, values in dict.items():
    val = values
    match = ["Manager", "Access Responsible", "Automatic", "Group"]
    c = Counter(val)
    sumofvalues = 0
    for m in match:
        if c[m] == 1:
            sumofvalues += 1

My initial idea was to adjust my first code where the initial key is the Application and WorkflowStep, Summary would be sub-dictionaries.

for i in range(len(df)):
    currentid = df.iloc[i,0]
    currentvalue = df.iloc[i,1]
    dict.setdefault(currentid, [])
    dict[currentid].append({"Workflow": [currentvalue], "Summary": []})

The result of this is however unsatisfactory because it does not add currentvalue to the already existing Workflow key but recreates them after every iteration.

Example

 {'WF:ACAA-CR (auto)': [{'Workflow': ['Manager'], 'Summary': []},
                        {'Workflow': ['Access Responsible'], 'Summary': []}, 
                        {'Workflow': ['Automatic'], 'Summary': []}]
 }

How can I create a dictionary similiar to what I wrote above?

Upvotes: 2

Views: 76

Answers (2)

Andrew McDowell
Andrew McDowell

Reputation: 2980

I think meW's answer is a much better way of doing things, and takes advantage of the neat power of dataframe's but for reference, if you wanted to do it the way you were trying, I think this will work:

# Create the data for testing.
d = {'Application': ["WF:ACAA-CR (auto)", "WF:ACAA-CR (auto)", "WF:ACAA-CR (auto)",
                     "WF:ACAA-CR-AccResp (auto)", "WF:ACAA-CR-AccResp (auto)", "WF:ACAA-CR-AccResp (auto)"],
     'WorkflowStep': ["Manager", "Access Responsible","Automatic","Manager","Access Responsible", "Automatic"]}
df = pd.DataFrame(d)

new_dict = dict()
# Iterate through the rows of the data frame. 
for index, row in df.iterrows():
    # Get the values for the current row.
    current_application_id = row['Application']
    current_workflowstep = row['WorkflowStep']

    # Set the default values if not already set.
    new_dict.setdefault(current_application_id, {'Workflow': [], 'Summary' : 0})

    # Add the new values.
    new_dict[current_application_id]['Workflow'].append(current_workflowstep)
    new_dict[current_application_id]['Summary'] += 1

print(new_dict)

Which gives an output of:

{'WF:ACAA-CR (auto)': {'Workflow': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}, 
'WF:ACAA-CR-AccResp (auto)': {'Workflow': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}}

Upvotes: 0

meW
meW

Reputation: 3967

IIUC, here's what can help -

val = df.groupby('Application')['WorkflowStep'].unique()
{val.index[i]: [{'WorkflowStep':list(val[i]), 'Summary':len(val[i])}] for i in range(len(val))}

resulting into,

{'WF:ACAA-CR (auto)': [{'WorkflowStep': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}],
 'WF:ACAA-CR-AccResp (auto)': [{'WorkflowStep': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}],
 'WF:ACAA-CR-IT-AccResp[AUTO]': [{'WorkflowStep': ['Group', 'Access Responsible', 'Automatic'], 'Summary': 3}]}

Upvotes: 4

Related Questions