Kris Walker
Kris Walker

Reputation: 173

How to split a list into a list of sublists that contain duplicate values in Python?

I have a list of lists of information about tides at certain times each day. It looks kinda like this:

tideData = [
['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73],
['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85],
...
['Friday 2 February',23.52,0.04]
]

I would like to split this list into sublists containing the same dates. In the case above, the list would become:

tideData = [
[['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73]],
[['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Friday 5 January',17.92,0.75]],
[['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85]],
...
['Friday 2 February',23.52,0.04]]
]

Now, this wouldn't be a problem if there was an equal number of each date. However, the dates sometimes appear twice and sometimes appear three times. As such, I'd like to be able to sort them into sublists based on repeat dates. How would I go about this?

Upvotes: 1

Views: 1478

Answers (3)

jpp
jpp

Reputation: 164613

You can use collections.defaultdict for an O(n) solution.

In Python 3.7, you will have the added benefit that the order of values will match the order in the input. This works in Python 3.6, but is considered an implementation detail.

from collections import defaultdict

d = defaultdict(list)

for item in tideData:
    d[item[0]].append(item)
    
res = list(d.values())

Result:

[[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73]],
 [['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]],
 [['Saturday 6 January', 0.5, 0.02], ['Saturday 6 January', 7.18, 0.85]],
 [['Friday 2 February', 23.52, 0.04]]]

For those interested in performance difference between O(n) and O(n log n) solutions:

from collections import defaultdict
from itertools import groupby
from operator import itemgetter

tideData = [
['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73],
['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85],
['Friday 2 February',23.52,0.04]
]

tideData = tideData * 10000

def jp(tideData):
    d = defaultdict(list)
    for item in tideData:
        d[item[0]].append(item)
    return list(d.values())

def grp(tideData):
    grouper = groupby(sorted(tideData, key=itemgetter(0)), key=itemgetter(0))
    return [list(g) for _, g in grouper]

%timeit jp(tideData)   # 5.63 ms per loop
%timeit grp(tideData)  # 9.87 ms per loop

Upvotes: 2

Aaditya Ura
Aaditya Ura

Reputation: 12669

Here is simple approach without any import :

groub_by={}
for i,j in enumerate(tideData):
    if j[0] not in groub_by:
        groub_by[j[0]]=[j]
    else:
        groub_by[j[0]].append(j)
print(groub_by.values())

output:

[[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73]], [['Saturday 6 January', 0.5, 0.02], ['Saturday 6 January', 7.18, 0.85]], [['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]], [['Friday 2 February', 23.52, 0.04]]]

Upvotes: 0

James
James

Reputation: 36598

I think you want to use groupby from the itertools package

from itertools import groupby

tideData = [
['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73],
['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85],
['Friday 2 February',23.52,0.04]
]

If you data is not sorted, you can use:

tideData = sorted(tideData, key=lambda x: x[0])

before using the following:

[list(g) for _,g in groupby(tideData, key=lambda x: x[0])]
# returns:
[[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73]],
 [['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]],
 [['Saturday 6 January', 0.5, 0.02], ['Saturday 6 January', 7.18, 0.85]],
 [['Friday 2 February', 23.52, 0.04]]]

Upvotes: 4

Related Questions