Reputation: 3783
I have two lists: one with partial transactions, and another one with their parent transactions:
partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']
I zip those lists into dictionary:
transactions = zip(partials, parents)
As you can see, some partial transactions have same parent transactions.
I need to group the items in dictionary into smaller groups (smaller dictionaries?) in such way that in each group there will be no more than one transaction belonging to one parent. So for example all transactions with parent "a" will need to end up in different groups.
I also need as few groups as possible, because in real world each group will be a file to upload manually.
Expected output would be like this:
Group 1 would contain transactions 1a, 2b,3c, 4d, 7f,
Group 2 would contain transactions 5a, 6d, 8c,
Group 3 would contain transactions 9c, 10a
I have been scratching my head for a while on this and will appreciate any suggestions. So far I do not have any working code to post.
Upvotes: 1
Views: 455
Reputation: 8740
&barciewicz, based on your provided inputs and expected outputs, I've also tried to solve this problem in my way.
Note» I have used OrderedDict() from collections module to retain the order of keys in dictionary. I've also used json module to pretty print the dictionaries, list etc.
I've shown 3 different ways to to get the results from 3 separate functions as follows.
{ "group1": "1a,2b,3c,4d,7f", "group2": "5a,8c,6d", "group3": "10a,9c" }
{ "group1": { "a": 1, "b": 2, "c": 3, "d": 4, "f": 7 }, "group2": { "a": 5, "c": 8, "d": 6 }, "group3": { "a": 10, "c": 9 } }
{ "a": [ 1, 5, 10 ], "b": [ 2 ], "c": [ 3, 8, 9 ], "d": [ 4, 6 ], "f": [ 7 ] }
» Try the below code online at http://rextester.com/OYFF74927.
from collections import OrderedDict
import json;
def get_formatted_transactions(parents, partials):
d = OrderedDict();
for index, partial in enumerate(partials):
if parents[index] in d:
l = d[parents[index]]
l.append(partial)
d[parents[index]] = l;
else:
d[parents[index]] = [partial]
return d;
def get_groups(transactions):
i = 1;
groups = OrderedDict();
while transactions:
group_name = "group" + str(i);
groups[group_name] = {};
keys = list(transactions.keys());
for key in keys:
if transactions[key]:
groups[group_name][key] = transactions[key].pop(0);
if not transactions[key]:
del transactions[key];
else:
del transactions[key]
i += 1;
return groups;
def get_comma_separated_data(groups):
new_dict = OrderedDict();
for group_name in groups:
d = groups[group_name]
new_dict[group_name] = ",".join([str(value) + key for value, key in zip(d.values(), d.keys())])
return new_dict;
# Starting point
if __name__ == "__main__":
partials = [1,2,3,4,5,6,7,8,9,10];
parents = ['a','b','c','d','a','d','f','c','c','a'];
transactions = get_formatted_transactions(parents, partials);
# Pretty pritining ordered dictionary
print(json.dumps(transactions, indent=4));
print("\n");
# Creating groups to organize transactions
groups = get_groups(transactions)
# Pretty printing
print(json.dumps(groups, indent=4))
print("\n");
# Get comma separated form
comma_separated_data = get_comma_separated_data(groups);
# Pretty printing
print(json.dumps(comma_separated_data, indent=4));
Output »
{
"a": [
1,
5,
10
],
"b": [
2
],
"c": [
3,
8,
9
],
"d": [
4,
6
],
"f": [
7
]
}
{
"group1": {
"a": 1,
"b": 2,
"c": 3,
"d": 4,
"f": 7
},
"group2": {
"a": 5,
"c": 8,
"d": 6
},
"group3": {
"a": 10,
"c": 9
}
}
{
"group1": "1a,2b,3c,4d,7f",
"group2": "5a,8c,6d",
"group3": "10a,9c"
}
Upvotes: 0
Reputation: 30210
Here's one approach:
def bin_unique(partials, parents):
bins = []
for (ptx,par) in zip(partials, parents):
pair_assigned = False
# Try to find an existing bin that doesn't contain the parent.
for bin_contents in bins:
if par not in bin_contents:
bin_contents[par] = (ptx, par)
pair_assigned = True
break
# If we haven't been able to assign the pair, create a new bin
# (with the pair as it's first entry)
if not pair_assigned:
bins.append({par: (ptx, par)})
return bins
Usage
partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']
binned = bin_unique(partials, parents)
Output
# Print the list of all dicts
print(binned)
# [
# {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')},
# {'a': (5, 'a'), 'd': (6, 'd'), 'c': (8, 'c')},
# {'c': (9, 'c'), 'a': (10, 'a')}
# ]
# You can access the bins via index
print(binned[0]) # {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}
print(len(binned)) # 3
# Each bin is a dictionary, keyed by parent, but the values are the (partial, parent) pair
print(binned[0].keys()) # dict_keys(['a', 'b', 'c', 'd', 'f'])
print(binned[0].values()) # dict_values([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')])
# To show that all the transactions exist
all_pairs = [pair for b in binned for pair in b.values()]
print(sorted(all_pairs) == sorted(zip(partials, parents))) # True
Upvotes: 2
Reputation: 353149
One way would just be to keep track of how many times you've seen a given parent. The first time you see parent 'a', you add that partial/parent pair to the first group; the second, to the second group, etc.
For example:
def split_into_groups(transactions):
counts = {}
out_groups = {}
for partial, parent in transactions:
counts[parent] = target = counts.get(parent, 0) + 1
out_groups.setdefault(target, {})[partial] = parent
return out_groups
gives me
In [9]: split_into_groups(zip(partials, parents))
Out[9]:
{1: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 7: 'f'},
2: {5: 'a', 6: 'd', 8: 'c'},
3: {9: 'c', 10: 'a'}}
This uses counts.get
to get a default value of 0 if the count isn't present yet, and out_groups.setdefault
to make a default empty dictionary and put it into out_groups if we haven't seen that target count yet.
If you had to handle the case of duplicate partials, you could replace the setdefault line with
out_groups.setdefault(target, []).append((partial, parent))
which would turn the group members into lists of tuples instead of dictionaries:
{1: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')],
2: [(5, 'a'), (6, 'd'), (8, 'c')],
3: [(9, 'c'), (10, 'a')]}
Upvotes: 1