Reputation: 37
So, I am working on a project, and I have the following list :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
I want to run a code that will check whether the first character of each string is present in an other string, and select them to add them in a new list if yes.
I know how to do it, but only for two strings. Here, I want to do it so that it will select all of those which start with the same string, and sort it through the number of original string there is . For example, I want to regroup by sublist of 3 strings (so, coming from the original list), all the possible combinations of strings which start with the same string.
Also, I wish the result would only count one string per possible association of substrings, and not give different combinations with the same substrings but different orders.
The expected result in that case (i.e when i want strings of 3 substrings and with a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
) is:
['2 co, 2 tr, ,2 pi', '2 co, 2 tr, 2, ca', '2pi, 2ca, 2tr', '2pi, 2ca, 2co', 3 co, 3 ca, 3 pi]
You see that here, I don't have '2 tr, 2 co, 2 pi'
, because i already have '2 co, 2 tr, ,2 pi'
And when i want to regroup by sublist of 4, the expected output is
['2 co, 2 tr, 2, pi, 2 ca']
I managed how to do it, but only when grouping by subset of two, and it gives all the combinations including the one with the same substrings but different order... here is it :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
result = []
for i in range(len(a)):
for j in a[:i]+a[i+1:]:
if a[i][0] == j[0]:
result.append(j)
print(result)
Thanks for your help !
Upvotes: 2
Views: 94
Reputation: 36249
You can use itertools.groupby
and itertools.combinations
for that task:
import itertools as it
import operator as op
groups = it.groupby(sorted(a), key=op.itemgetter(0))
result = [', '.join(c) for g in groups for c in it.combinations(g[1], 3)]
Note that if the order of elements should only depend on the first character you might want to add another key=op.itemgetter(0)
to the sorted
function. If the data is already presorted such that "similar" items (with the same first character) are next to each other then you can drop the sorted
all together.
it.groupby
puts the data into groups, based on their first character (due to key=op.itemgetter(0)
, which selects the first item, i.e. the first character, from each string). Expanding groups, it looks like this:
[('2', ['2 co', '2 tr', '2 pi', '2 ca']),
('3', ['3 co', '3 ca', '3 pi']),
('6', ['6 tr', '6 pi']),
('7', ['7 ca', '7 pi']),
('8', ['8 tr'])]
Then for each of the groups it.combinations(..., 3)
computes all possible combinations of length 3 and concatenates them in the list comprehension (for groups with less than 3 members no combinations are possible):
['2 co, 2 tr, 2 pi',
'2 co, 2 tr, 2 ca',
'2 co, 2 pi, 2 ca',
'2 tr, 2 pi, 2 ca',
'3 co, 3 ca, 3 pi']
Upvotes: 4