Bit Bucket
Bit Bucket

Reputation: 942

Extracting Data From Multiple Dictionaries

My data structures:

phase1_hits = {
    '1.2.3.4': {'hits': 3, 'internal': '10.28.30.153', 'public_additional': ['8.2.17.14'], 'list': 'Red', 'internal_additional': ['10.17.100.74', '10.19.70.77', '10.28.30.153']},
    '2.3.4.5': {'hits': 19, 'internal': '10.19.40.175', 'public_additional': ['1.2.227.49'], 'list': 'Red', 'internal_additional': ['10.19.40.175']},
    '12.23.34.45': {'hits': 52, 'internal': '192.168.164.32', 'public_additional': ['8.2.17.14'], 'list': 'Orange', 'internal_additional': ['192.168.164.32', '192.168.164.42', '192.168.164.49']},
    '8.8.8.8': {'hits': 5, 'internal': '192.168.1.10', 'public_additional': ['8.8.87.153', '1.2.3.4'], 'list': 'Green', 'internal_additional': ['192.168.168.250']}
}


phase2_hits = {
    97536: {'ip.dst': ['8.2.17.14'], 'ip.src': ['10.28.30.153']},
    60096: {'ip.dst': ['8.2.17.14'], 'ip.src': ['192.168.164.42']}, 
    43140: {'ip.dst': ['8.2.17.9'], 'ip.src': ['10.153.134.201']},
    43789: {'ip.dst': ['10.28.30.153'], 'ip.src': ['8.2.17.9']},
    89415: {'ip.dst': ['8.2.17.14'], 'ip.src': ['10.153.134.200']}
}

Facts about the data structure (maybe none of this matters??):

If a phase1_hits internal or internal_additional IP is seen in phase2_hits I want to extract the corresponding:

The key concept of the extraction is to match up which private IP(s) talked to which public IP(s). Also, if it would help I can restructure phase1_hits and use a different key.

Upvotes: 0

Views: 253

Answers (1)

Edwin
Edwin

Reputation: 2114

matches = []
for ip1 in phase1_hits:
    subdict1 = phase1_hits[ip1]
    internal_ips = [subdict1['internal']] + subdict1['internal_additional']
    hits = subdict1['hits']
    color_list = subdict1['list']
    for ip2 in phase2_hits:
        subdict2 = phase2_hits[ip2]
        phase2_ips = subdict2['ip.dst'] + subdict2['ip.src']
        overlap = [i for i in internal_ips if i in phase2_ips]
        if len(overlap) > 0:
            temp = (ip1, overlap, hits, color_list, ip2, [i for i in phase2_ips if i in overlap])
            matches.append(temp)

Here's the explanation:

matches = []

You need to store the data somewhere where you can change/use it later. A list is the easiest, since it can easily change size according to the data you have and the number of elements that change. Though a dictionary is doable, it wouldn't be very efficient, especially since you are looking for single matchups and aren't trying to create pointers to specific data.

for ip1 in phase1_hits:

You can traverse the keys in a dictionary by treating it much like a list (i.e., calling the .keys() function is not necessary since you are not changing/deleting the keys in the dictionary).

    subdict1 = phase1_hits[ip1]
    internal_ips = [subdict1['internal']] + subdict1['internal_additional']
    hits = subdict1['hits']
    color_list = subdict1['list']

I aliased the subdictionary for readability; it is, however, not necessary. On the following line, I took advantage of Python's native list __add__, which simple appends the elements of one list to the other and creates a new list, because internal_ips can sometimes have multiple elements. Then, because you want the number of hits and the 'list' value in the subdict, I created color_list. Note I did not name it the same as the key, because doing so would conflict with Python's native namespace for the list variable type.

    for ip2 in phase2_hits:
        subdict2 = phase2_hits[ip2]
        phase2_ips = subdict2['ip.dst'] + subdict2['ip.src']
        overlap = [i for i in internal_ips if i in phase2_ips]

The only new thing here is overlap. By using a list generator, we can find all the overlapping values (hence the name). You can call it what you want; just know that it will be populated with all the common values between the two. (You can use it too: formula is basically [i for i in L1 if i in L2], where L1 and L2 are both lists.)

        if len(overlap) > 0:
            temp = (ip1, overlap, hits, color_list, ip2, [i for i in phase2_ips if i in overlap])
            matches.append(temp)

The if statement ensures that there is at least one overlap between phase1_hits internal or internal_additional IP and phase2_hits. If so, it will populate a tuple (which is immutable) based on this info. (I chose a tuple since it's immutable and you know its structure, but you can change it into a list if you want.) Once populated, it is then appended to the matches list.

Once done going through both loops you should have what you want.

Upvotes: 1

Related Questions