Merging two columns of a dataframe which contains dictionaries and storing it into other column in a data frame

Question

I have a data frame df which has two columns a and b which contains dictionaries and I want to merge these two dictionaries and store the merged dictionaries in a new columns c. One sample data point is :

df :

     a                 b                         c
------------------------------------------------------------
{x:{y:{z:u}}     {w:{f:{h:l}}        {x:{y:{z:u}},{w:{f:{h:l}}

I have a and b and I want c. I have a function that merges the two dictionaries but I am not able to assign the merge dictionaries to column c.

The function that I have for merging two dictionaries:

# Function for merging two dictionaries and adding values if keys are same
def merge_and_add(dict1, dict2):
    # We loop over the key and value pairs of the second dictionary...
    for k, v in dict2.items():
        # If the key is also found in the keys of the first dictionary, and...
        if k in dict1.keys():
            # If  the value is a dictionary...
            if isinstance(v, dict):
                # we pass this value to the merge_and_add function, together with the value of first dictionary with
                # the same key and we overwrite this value with the output.
                dict1[k] = merge_and_add(dict1[k], v)
            # If the value is an integer...
            elif isinstance(v, int):
                # we add the value of the key value pair of the second dictionary to the value of the first 
                # dictionary with the same key.
                dict1[k] = dict1[k] + v
        # If the key is not found, the key and value of the second should be appended to the first dictionary
        else:
            dict1[k] = v
    # return the first dictionary
    return dict1

I am trying the following but it isn't working:

df_curr_hist['latest'] = np.nan
def latest(df):
    idx = 0
    while idx < len(df):
        curr_dict = df.iloc[idx]['current']
        hist_dict = df.iloc[idx]['history']
        df.latest[idx] = merge_and_add(curr_dict, hist_dict)
    return df

Roshan Santhosh · Accepted Answer

Firstly, you should know that dictionaries are not passed as values in function arguments. Therefore, in your current code, modifications are made to original dictionaries when creating the new combined dictionary. You can handle that by using copies of the dictionaries to work with.

x = {'a':{'b':2}}
y = {'c':{'e':4}}

e = pd.DataFrame({'a':[x], 'b': [y]})

def merge_and_add(x, y):

    dict1 = x.copy()
    dict2 = y.copy()
    # We loop over the key and value pairs of the second dictionary...
    for k, v in dict2.items():
        # If the key is also found in the keys of the first dictionary, and...
        if k in dict1.keys():
            # If  the value is a dictionary...
            if isinstance(v, dict):
                # we pass this value to the merge_and_add function, together with the value of first dictionary with
                # the same key and we overwrite this value with the output.
                dict1[k] = merge_and_add(dict1[k], v)
            # If the value is an integer...
            elif isinstance(v, int):
                # we add the value of the key value pair of the second dictionary to the value of the first 
                # dictionary with the same key.
                dict1[k] = dict1[k] + v
        # If the key is not found, the key and value of the second should be appended to the first dictionary
        else:
            dict1[k] = v
    # return the first dictionary
    return dict1


e['c'] = e.apply(lambda x : merge_and_add(x.a, x.b), axis = 1)

The final output looks like

                 a                b                               c
0  {'a': {'b': 2}}  {'c': {'e': 4}}  {'a': {'b': 2}, 'c': {'e': 4}}

Merging two columns of a dataframe which contains dictionaries and storing it into other column in a data frame

Answers (1)

Related Questions