Reputation: 173
I have a list of lists of information about tides at certain times each day. It looks kinda like this:
tideData = [
['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73],
['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85],
...
['Friday 2 February',23.52,0.04]
]
I would like to split this list into sublists containing the same dates. In the case above, the list would become:
tideData = [
[['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73]],
[['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Friday 5 January',17.92,0.75]],
[['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85]],
...
['Friday 2 February',23.52,0.04]]
]
Now, this wouldn't be a problem if there was an equal number of each date. However, the dates sometimes appear twice and sometimes appear three times. As such, I'd like to be able to sort them into sublists based on repeat dates. How would I go about this?
Upvotes: 1
Views: 1478
Reputation: 164613
You can use collections.defaultdict
for an O(n) solution.
In Python 3.7, you will have the added benefit that the order of values will match the order in the input. This works in Python 3.6, but is considered an implementation detail.
from collections import defaultdict
d = defaultdict(list)
for item in tideData:
d[item[0]].append(item)
res = list(d.values())
Result:
[[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73]],
[['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]],
[['Saturday 6 January', 0.5, 0.02], ['Saturday 6 January', 7.18, 0.85]],
[['Friday 2 February', 23.52, 0.04]]]
For those interested in performance difference between O(n) and O(n log n) solutions:
from collections import defaultdict
from itertools import groupby
from operator import itemgetter
tideData = [
['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73],
['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85],
['Friday 2 February',23.52,0.04]
]
tideData = tideData * 10000
def jp(tideData):
d = defaultdict(list)
for item in tideData:
d[item[0]].append(item)
return list(d.values())
def grp(tideData):
grouper = groupby(sorted(tideData, key=itemgetter(0)), key=itemgetter(0))
return [list(g) for _, g in grouper]
%timeit jp(tideData) # 5.63 ms per loop
%timeit grp(tideData) # 9.87 ms per loop
Upvotes: 2
Reputation: 12669
Here is simple approach without any import :
groub_by={}
for i,j in enumerate(tideData):
if j[0] not in groub_by:
groub_by[j[0]]=[j]
else:
groub_by[j[0]].append(j)
print(groub_by.values())
output:
[[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73]], [['Saturday 6 January', 0.5, 0.02], ['Saturday 6 January', 7.18, 0.85]], [['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]], [['Friday 2 February', 23.52, 0.04]]]
Upvotes: 0
Reputation: 36598
I think you want to use groupby
from the itertools
package
from itertools import groupby
tideData = [
['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73],
['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85],
['Friday 2 February',23.52,0.04]
]
If you data is not sorted, you can use:
tideData = sorted(tideData, key=lambda x: x[0])
before using the following:
[list(g) for _,g in groupby(tideData, key=lambda x: x[0])]
# returns:
[[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73]],
[['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]],
[['Saturday 6 January', 0.5, 0.02], ['Saturday 6 January', 7.18, 0.85]],
[['Friday 2 February', 23.52, 0.04]]]
Upvotes: 4