redacted code
redacted code

Reputation: 192

Create dictionary of dicts from a single list - Python3

Python 3.6.5/3.7.1 on Linux

Struggling to create a dictionary with dictionaries as values.

I want to create a dictionary from a list date & time data (ultimately to create charts with bokeh).

This must have been asked before, but I can't find a set of search terms that returns a result that clarifies matters for me.

nb I'm essentially a hobby coder, & I don't easily think algorithmically like a real programmer.

The data is in a list (max 3200 items): Each item is a record of the occurrence of an event on a date in a clock period of one hour.

Thus; ['03/01/19 09:00', '03/01/19 09:00', '03/01/19 09:00',] indicates 3 events between 0900-1000 on 03/01/2019.

Only clock periods with events are recorded, so if no event, no timestamp.

nb date format is ddmmyy

Example data:

dtl = [
    '06/01/19 12:00', '06/01/19 12:00', '06/01/19 11:00', '05/01/19 21:00',
    '05/01/19 17:00', '05/01/19 17:00', '05/01/19 14:00', '03/01/19 21:00',
    '03/01/19 17:00', '03/01/19 12:00', '03/01/19 12:00', '03/01/19 12:00',
    '03/01/19 12:00', '03/01/19 12:00', '03/01/19 11:00', '03/01/19 10:00',
    '03/01/19 10:00', '03/01/19 09:00','03/01/19 09:00','03/01/19 09:00',
]

The desired dictionary would look like this:

dtd = {
    '03/01/19': {
         '00': 0, '01': 0, '02': 0, '03': 0, '04': 0, '05': 0,
         '06': 0, '07': 0, '08': 0, '09': 3, '10': 2, '11': 1,
         '12': 5, '13': 0, '14': 0, '15': 0, '16': 0, '17': 1,
         '18': 0, '19': 0, '20': 0, '21': 1, '22': 0, '23': 0,
     },
     '04/01/19': {
         '00': 0, ... '23': 0
     },
     '05/01/19': {
         '00': 0, ... 
     } ... etc
}

Clearly I can initialise a dictionary with at least the keys:

{i.split()[0]:{} for i in dtl}

But then I can't get my head round what I need to do to update the subdicts with the counts, & so can't see a way to get from the original list to the desired dictionary. I'm going round in circles!

Upvotes: 1

Views: 147

Answers (3)

Mad Physicist
Mad Physicist

Reputation: 114548

You could combine a Counter with a defaultdict to do this pretty effectively once you have split into a dictionary by date. So first split by date:

from collections import Counter, defaultdict

dtd = defaultdict(list)
for date, time in (item.split() for item in dtl):
    dtd[date].append(time[:2])

Now you can easily count the existing items, and use them to initialize a defaultdict that will return zeros for the missing times:

for key in dtd:
    dtd[key] = defaultdict(int, Counter(dtd[key]))

The result is:

defaultdict(list, {
    '03/01/19': defaultdict(int, {
        '09': 3,
        '10': 2,
        '11': 1,
        '12': 5,
        '17': 1,
        '21': 1
    }),
    '05/01/19': defaultdict(int, {'14': 1, '17': 2, '21': 1}),
    '06/01/19': defaultdict(int, {'11': 1, '12': 2})
})

Since the objects here are defaultdicts, you will be able to query dates and times that were not in the original dataset. You can avoid this by converting the result to a regular dict containing only the keys you want after you finish:

hours = ['%02d' % h for h in range(24)]
dtd = {date: {h: d[h] for h in hours} for date, d in dtd}

Upvotes: 2

hqkhan
hqkhan

Reputation: 483

I'd suggest the use of collections.defaultdict since some of your counts can be 0.

Here's an option:

from collections import defaultdict

dtl = ['06/01/19 12:00', '06/01/19 12:00', '06/01/19 11:00', 
       '05/01/19 21:00', '05/01/19 17:00', '05/01/19 17:00', 
       '05/01/19 14:00', '03/01/19 21:00', '03/01/19 17:00',
       '03/01/19 12:00', '03/01/19 12:00', '03/01/19 12:00', 
       '03/01/19 12:00', '03/01/19 12:00', '03/01/19 11:00', 
       '03/01/19 10:00', '03/01/19 10:00', '03/01/19 09:00',
       '03/01/19 09:00','03/01/19 09:00',]

# Nested defaultdict
result = defaultdict(lambda: defaultdict(int))

for date_time in dtl:
    date, time = date_time.split()
    result[date][time.split(':')[0]] += 1

Output (using pprint):

defaultdict(<function <lambda> at 0x7f20d5c37c80>,
            {'03/01/19': defaultdict(<class 'int'>,
                                     {'09': 3,
                                      '10': 2,
                                      '11': 1,
                                      '12': 5,
                                      '17': 1,
                                      '21': 1}),
             '05/01/19': defaultdict(<class 'int'>,
                                     {'14': 1,
                                      '17': 2,
                                      '21': 1}),
             '06/01/19': defaultdict(<class 'int'>, {'12': 2, '11': 1})})

If you really want to show the 0 for printing then I don't really see a way around keeping an array of times as I've done here and initializing your dict that way.

times = ['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10',
         '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21',
         '22', '23']

dtl = ['06/01/19 12:00', '06/01/19 12:00', '06/01/19 11:00', 
       '05/01/19 21:00', '05/01/19 17:00', '05/01/19 17:00', 
       '05/01/19 14:00', '03/01/19 21:00', '03/01/19 17:00',
       '03/01/19 12:00', '03/01/19 12:00', '03/01/19 12:00', 
       '03/01/19 12:00', '03/01/19 12:00', '03/01/19 11:00', 
       '03/01/19 10:00', '03/01/19 10:00', '03/01/19 09:00',
       '03/01/19 09:00','03/01/19 09:00']

result = {date_time.split()[0] : {time : 0 for time in times} for date_time in dtl}

for date_time in dtl:
    date, time = date_time.split()
    result[date][time.split(':')[0]] += 1

Output below:

{'06/01/19': {'00': 0, '01': 0, '02': 0, '03': 0, '04': 0, '05': 0, '06': 0, '07': 0, '08': 0, '09': 0, '10': 0, '11': 1, '12': 2, '13': 0, '14': 0, '15': 0, '16': 0, '17': 0, '18': 0, '19': 0, '20': 0, '21': 0, '22': 0, '23': 0}, '05/01/19': {'00': 0, '01': 0, '02': 0, '03': 0, '04': 0, '05': 0, '06': 0, '07': 0, '08': 0, '09': 0, '10': 0, '11': 0, '12': 0, '13': 0, '14': 1, '15': 0, '16': 0, '17': 2, '18': 0, '19': 0, '20': 0, '21': 1, '22': 0, '23': 0}, '03/01/19': {'00': 0, '01': 0, '02': 0, '03': 0, '04': 0, '05': 0, '06': 0, '07': 0, '08': 0, '09': 3, '10': 2, '11': 1, '12': 5, '13': 0, '14': 0, '15': 0, '16': 0, '17': 1, '18': 0, '19': 0, '20': 0, '21': 1, '22': 0, '23': 0}}

Upvotes: 2

Jonas B.
Jonas B.

Reputation: 172

One quick and dirty way is this:

#!/usr/bin/env python3

def convert(dt):
    ret = {}
    for elem in dt:
        d,t = elem.split()
        t = t.split(":")[0]
        # not a valid value
        if not d: pass

        # we inserted d already
        if d in ret:
            if t in ret[d]:
                ret[d][t] += 1
        else:
            ret[d] = {'00': 0, '01': 0, '02': 0, '03': 0, '04': 0, '05': 0,
                    '06': 0, '07': 0, '08': 0, '09': 0, '10': 0, '11': 0, 
                    '12': 0, '13': 0, '14': 0, '15': 0, '16': 0, '17': 0, 
                    '18': 0, '19': 0, '20': 0, '21': 0, '22': 0, '23': 0 }
    return ret

dtl = ['06/01/19 12:00', '06/01/19 12:00', '06/01/19 11:00', '05/01/19 21:00', '05/01/19 17:00', '05/01/19 17:00', '05/01/19 14:00', '03/01/19 21:00', '03/01/19 17:00','03/01/19 12:00', '03/01/19 12:00', '03/01/19 12:00', '03/01/19 12:00', '03/01/19 12:00', '03/01/19 11:00', '03/01/19 10:00', '03/01/19 10:00', '03/01/19 09:00','03/01/19 09:00','03/01/19 09:00']

print(convert(dtl))

Upvotes: 0

Related Questions