barciewicz
barciewicz

Reputation: 3783

Splitting dictionary items into smaller dictionaries based on condition

I have two lists: one with partial transactions, and another one with their parent transactions:

partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']

I zip those lists into dictionary:

transactions = zip(partials, parents)

As you can see, some partial transactions have same parent transactions.

I need to group the items in dictionary into smaller groups (smaller dictionaries?) in such way that in each group there will be no more than one transaction belonging to one parent. So for example all transactions with parent "a" will need to end up in different groups.

I also need as few groups as possible, because in real world each group will be a file to upload manually.

Expected output would be like this:

Group 1 would contain transactions 1a, 2b,3c, 4d, 7f,

Group 2 would contain transactions 5a, 6d, 8c,

Group 3 would contain transactions 9c, 10a

I have been scratching my head for a while on this and will appreciate any suggestions. So far I do not have any working code to post.

Upvotes: 1

Views: 455

Answers (3)

hygull
hygull

Reputation: 8740

&barciewicz, based on your provided inputs and expected outputs, I've also tried to solve this problem in my way.

Note» I have used OrderedDict() from collections module to retain the order of keys in dictionary. I've also used json module to pretty print the dictionaries, list etc.

I've shown 3 different ways to to get the results from 3 separate functions as follows.

{ "group1": "1a,2b,3c,4d,7f", "group2": "5a,8c,6d", "group3": "10a,9c" }

 

{ "group1": { "a": 1, "b": 2, "c": 3, "d": 4, "f": 7 }, "group2": { "a": 5, "c": 8, "d": 6 }, "group3": { "a": 10, "c": 9 } }

 

{ "a": [ 1, 5, 10 ], "b": [ 2 ], "c": [ 3, 8, 9 ], "d": [ 4, 6 ], "f": [ 7 ] }

 

» Try the below code online at http://rextester.com/OYFF74927.

from collections import OrderedDict
import json; 

def get_formatted_transactions(parents, partials):
    d = OrderedDict();

    for index, partial in enumerate(partials):
        if parents[index] in d:
            l = d[parents[index]]
            l.append(partial)
            d[parents[index]] = l;
        else:
            d[parents[index]] = [partial]

    return d;


def get_groups(transactions):
    i = 1;
    groups = OrderedDict();

    while transactions:
        group_name = "group" + str(i);
        groups[group_name] = {};
        keys = list(transactions.keys());

        for key in keys:
            if transactions[key]:
                groups[group_name][key] = transactions[key].pop(0);
                if not transactions[key]:
                    del transactions[key];
            else:
                del transactions[key]
        i += 1;

    return groups;

def get_comma_separated_data(groups):
    new_dict = OrderedDict();
    for group_name in groups:
        d = groups[group_name]
        new_dict[group_name] = ",".join([str(value) + key  for value, key in zip(d.values(), d.keys())])

    return new_dict;



# Starting point
if __name__ == "__main__":
    partials = [1,2,3,4,5,6,7,8,9,10];
    parents = ['a','b','c','d','a','d','f','c','c','a'];

    transactions = get_formatted_transactions(parents, partials);
    # Pretty pritining ordered dictionary
    print(json.dumps(transactions, indent=4));

    print("\n");

    # Creating groups to organize transactions
    groups = get_groups(transactions)
    # Pretty printing
    print(json.dumps(groups, indent=4))

    print("\n");

    # Get comma separated form 
    comma_separated_data = get_comma_separated_data(groups);
    # Pretty printing
    print(json.dumps(comma_separated_data, indent=4));
Output »
{
    "a": [
        1,
        5,
        10
    ],
    "b": [
        2
    ],
    "c": [
        3,
        8,
        9
    ],
    "d": [
        4,
        6
    ],
    "f": [
        7
    ]
}

{
    "group1": {
        "a": 1,
        "b": 2,
        "c": 3,
        "d": 4,
        "f": 7
    },
    "group2": {
        "a": 5,
        "c": 8,
        "d": 6
    },
    "group3": {
        "a": 10,
        "c": 9
    }
}

{
    "group1": "1a,2b,3c,4d,7f",
    "group2": "5a,8c,6d",
    "group3": "10a,9c"
}

Upvotes: 0

jedwards
jedwards

Reputation: 30210

Here's one approach:

def bin_unique(partials, parents):
    bins = []
    for (ptx,par) in zip(partials, parents):
        pair_assigned = False
        # Try to find an existing bin that doesn't contain the parent.
        for bin_contents in bins:
            if par not in bin_contents:
                bin_contents[par] = (ptx, par)
                pair_assigned = True
                break
        # If we haven't been able to assign the pair, create a new bin
        #   (with the pair as it's first entry)
        if not pair_assigned:
            bins.append({par: (ptx, par)})

    return bins

Usage

partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']
binned = bin_unique(partials, parents)

Output

# Print the list of all dicts
print(binned)
# [
#   {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}, 
#   {'a': (5, 'a'), 'd': (6, 'd'), 'c': (8, 'c')}, 
#   {'c': (9, 'c'), 'a': (10, 'a')}
# ]

# You can access the bins via index
print(binned[0])            # {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}
print(len(binned))          # 3

# Each bin is a dictionary, keyed by parent, but the values are the (partial, parent) pair
print(binned[0].keys())     # dict_keys(['a', 'b', 'c', 'd', 'f'])
print(binned[0].values())   # dict_values([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')])

# To show that all the transactions exist
all_pairs = [pair for b in binned for pair in b.values()]
print(sorted(all_pairs) == sorted(zip(partials, parents)))  # True

Upvotes: 2

DSM
DSM

Reputation: 353149

One way would just be to keep track of how many times you've seen a given parent. The first time you see parent 'a', you add that partial/parent pair to the first group; the second, to the second group, etc.

For example:

def split_into_groups(transactions):
    counts = {}
    out_groups = {}
    for partial, parent in transactions:
        counts[parent] = target = counts.get(parent, 0) + 1
        out_groups.setdefault(target, {})[partial] = parent
    return out_groups

gives me

In [9]: split_into_groups(zip(partials, parents))
Out[9]: 
{1: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 7: 'f'},
 2: {5: 'a', 6: 'd', 8: 'c'},
 3: {9: 'c', 10: 'a'}}

This uses counts.get to get a default value of 0 if the count isn't present yet, and out_groups.setdefault to make a default empty dictionary and put it into out_groups if we haven't seen that target count yet.

If you had to handle the case of duplicate partials, you could replace the setdefault line with

out_groups.setdefault(target, []).append((partial, parent))

which would turn the group members into lists of tuples instead of dictionaries:

{1: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')],
 2: [(5, 'a'), (6, 'd'), (8, 'c')],
 3: [(9, 'c'), (10, 'a')]}

Upvotes: 1

Related Questions