smatthewenglish
smatthewenglish

Reputation: 2889

Python function to find the min/max based on single attribute from a nested dictionary structure

The following data representation:

[
 {u'0xbd4f1cc0da707c5712651b659b86766ec6f25af5e388fc82474523339dd1da37': u'90000'},
 {u'0x05a04a7bb2500087c14bc89eb6a49cd4c5afcac63270aff2d4508e610f606eed': u'40000'},
 {u'0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5': u'21000'},
 {u'0x79dcc6ab82b2024a0d4135d4fa3a5cd62ab740f28fffa3fc4dfdb8b00430baab': u'158971'},
 {u'0x034c9e7f28f136188ebb2a2630c26183b3df90c387490159b411cf7326764341': u'21000'},
 {u'0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e': u'1000000'},
 {u'0x90ca439b7daa648fafee829d145adefa1dc17c064f43db77f573da873b641f19': u'90000'},
 {u'0x7cba9f140ab0b3ec360e0a55c06f75b51c83b2e97662736523c26259a730007f': u'40000'},
 {u'0x92dedff7dab405220c473aefd12e2e41d260d2dff7816c26005f78d92254aba2': u'21000'},
 {u'0x0abe75e40a954d4d355e25e4498f3580e7d029769897d4187c323080a0be0fdd': u'21000'},
 {u'0x22c2b6490900b21d67ca56066e127fa57c0af973b5d166ca1a4bf52fcb6cf81c': u'90000'},
 {u'0x8570106b0385caf729a17593326db1afe0d75e3f8c6daef25cd4a0499a873a6f': u'90000'},
 {u'0x8adfe7fc3cf0eb34bb56c59fa3dc4fdd3ec3f3514c0100fef800f065219b7707': u'40000'},
 {u'0x8b0fe2b7727664a14406e7377732caed94315b026b37577e2d9d258253067553': u'21000'},
 {u'0x244b29b60c696f4ab07c36342344fe6116890f8056b4abc9f734f7a197c93341': u'50000'},
 {u'0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63': u'121000'},
 {u'0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26': u'121000'}
]

Is generated from this loop:

dict_hash_gas = list()
for line in inpt:
    resource = json.loads(line)
    dict_hash_gas.append({resource['first']:resource['second']})

Based on data that looks, more or less, like so:

{"first":"A","second":"1","third":"2"} 
{"first":"B","second":"1","third":"2"} 
{"first":"C","second":"2","third":"2"} 
{"first":"D","second":"3","third":"2"} 
{"first":"E","second":"3","third":"2"} 
{"first":"F","second":"3","third":"2"} 

I've tried to find the maximum value of the second value in each dict, i.e.

{"first":"A","second":"LOOKING_FOR_MAX"}

How can I access all of the second values (the ones that look like u'90000') from that set of nested dictionaries, record and output the max and the min?


To precisely define terms: In the example up top, i.e.:

{u'0xbd4f1cc0da707c5712651b659b86766ec6f25af5e388fc82474523339dd1da37': u'90000'},
{u'0x05a04a7bb2500087c14bc89eb6a49cd4c5afcac63270aff2d4508e610f606eed': u'40000'},
{u'0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5': u'21000'},

I'd like to search on the basis of u'90000', u'40000' and u'21000'- that's what I mean by "second" value.

The selection of max I'd like to make would be on the basis of the number alone, so in that case u'90000'.


EDIT:

Trying to call it in the following way, I generated the error reproduced below:

def _main():

    with open('transactions000000000029.json', 'rb') as inpt:
        dict_hash_gas = list()
        for line in inpt:
            resource = json.loads(line)
            dict_hash_gas.append({resource['hash']:resource['gas']})

    pairs = list(_as_pairs(dict_hash_gas))
    if pairs:
        # Avoid a ValueError from min() and max() if the list is empty.
        print(min(pairs, key=lambda pair: pair.value))
        print(max(pairs, key=lambda pair: pair.value))

enter image description here

Upvotes: 0

Views: 409

Answers (2)

Kevin J. Chase
Kevin J. Chase

Reputation: 3956

Once you have your data in a tractable form, it's a one-liner. In this case, since those dictionaries are obviously records of some sort, the ideal data type is either a custom class or a collections.namedtuple. I went with the namedtuple, since all the values are atomic and immutable. (Also, it comes with many handy features like decent __str__ and __hash__ methods, and it's more efficient too.)

All of the effort below is in _as_pairs, which generates immutable key-value pairs from that frustrating list of one-item dictionaries. It also converts the stringified integers (value) into the actual integers you wish they were. After that, using the data is easy.

import collections

# FIXME:  Use more descriptive names than "Pair", "key", and "value".
Pair = collections.namedtuple('Pair', ['key', 'value'])

def _as_pairs(pairs):
    for pair in pairs:
        # TODO:  Verify the dict conatains exactly one item?
        for k, v in pair.items():
            # Should the `key` string also be an integer?
            #yield Pair(key=int(k, base=16), value=int(v))
            yield Pair(key=k, value=int(v))

def _main():
    # Abbreviated below, but conatains same inputs as your example.
    dict_hash_gas = [
      ...,
      {u'0xffda...606e': u'1000000'},
      {u'0x90ca...1f19': u'90000'},
      ...,
      ]
    pairs = list(_as_pairs(dict_hash_gas))
    if pairs:
        # Avoid a ValueError from min() and max() if the list is empty.
        print(min(pairs, key=lambda pair: pair.value))
        print(max(pairs, key=lambda pair: pair.value))

if '__main__' == __name__:
    _main()

Output (Python 3):

Pair(key='0xc3f6...f9b5', value=21000)
Pair(key='0xffda...606e', value=1000000)

I've included a couple suggestions in the comments:

  • Is it important that those dictionaries have exactly one item each?

  • Should those hexadecimal strings (which I called id) also be converted into integers?

I can't tell what you're using this for, so I can't answer either of those questions.

Upvotes: 1

brian buck
brian buck

Reputation: 3454

Are you constrained to using dictionaries here? A list of tuples might be simpler to use:

dict_hash_gas = list()
for line in inpt:
    resource = json.loads(line)
    dict_hash_gas.append((resource['first'], resource['second']))

sorted_data = sorted(dict_hash_gas, key=lambda x: int(x[1]))
minimum = sorted_data[0]
maximum = sorted_data[-1]

yields: ('0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5', '21000') for the minimum and ('0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e', '1000000') for the maximum

Edit to show example using collections.namedtuple:

from collections import namedtuple

DataItem = namedtuple('DataItem', ['first', 'second'])

dict_hash_gas = list()
for line in inpt:
    resource = json.loads(line)
    dict_hash_gas.append(DataItem(resource['first'], resource['second']))

sorted(dict_hash_gas, key=lambda x: int(x.second))

Upvotes: 0

Related Questions