curious_cosmo
curious_cosmo

Reputation: 1214

Inserting empty values into python dictionary

I have a python dictionary that I ultimately want to insert into a mysql database. I'm parsing data from something called "entries" which goes like (the # symbolize numbers):

entries = [ "['data'] runtime: ###, scan: ###", 
            "['data'] ctime: ###, scan: ###", 
            "['data'] runtime: ###", ... ]

Each thing in the "" is a separate entry. Now I use regex to extract the runtimes, ctimes, and scans associated with each entry like so:

import re
terms = (["runtime", "runtime\s?:\s?(\d+)"],
         ["ctime", "ctime\s?:\s?(\d+)"],
         ["scan", "scan\s?:\s?(\d+)"])
d = {}
for i in range(len(terms)):
    def getTerm(term, entries):
        pattern = re.compile(term)
        output = pattern.findall(str(entries))
        return output
    d[terms[i][0]] = getTerm(terms[i][1], entries)

This works -- however, as you can see, not all of the entries have a runtime, ctime, and scan. If a value doesn't appear in an entry, I want it to be entered into my dictionary as [] or NULL (or None), because in the future if I look at a specific # element of each key in my dictionary, I want all that data to be associated with one specific entry. I want my dictionary to then look like this:

d = {'ctime': [None, '###', None], 'runtime': ['###', None, '###'], 'scan': ['###', '###', None]}

How do I do this?

Upvotes: 1

Views: 204

Answers (2)

brennan
brennan

Reputation: 3493

If entries is a list of strings that may or may not contain the keywords and order is important then we'll need to iterate over the entries:

First option:

import re

entries = [ "['data'] runtime: ###, scan: ###",
            "['data'] ctime: ###, scan: ###",
            "['data'] runtime: ###" ]

allterms = (["runtime", "runtime\s?:\s?([a-zA-Z0-9_#]*)"],
            ["ctime", "ctime\s?:\s?([a-zA-Z0-9_#]*)"],
            ["scan", "scan\s?:\s?([a-zA-Z0-9_#]*)"])
terms = [allterms[i][0] for i in range(len(allterms))]
patterns = [allterms[i][1] for i in range(len(allterms))]

def get_terms(entry):
    for i in range(len(terms)):
        term = re.search(patterns[i], entry)
        term = term.groups()[0] if term else None
        d[terms[i]] += [term]
        pass

d = {t: [] for t in allterms}
for entry in entries:
     get_terms(entry)

Second option with async:

# pip install futures  # if using Python 2 
from concurrent.futures import ThreadPoolExecutor

d = {t: [] for t in allterms}
with ThreadPoolExecutor() as executor:
    for entry in entries:
        get_terms(entry)

Edit: Solution developed in chat collab with @Wynne :)

Upvotes: 1

Hrabal
Hrabal

Reputation: 2525

re.findall() return an empty list ([]) when no match is found, so you don't need an empty fallback. If you want to have None when no term is found, as Brennan said, user findall(string) or None.

Consider using list comprehension to loop over all your entries, and dict comprehension to apply your regex patterns over the same entry and save the result in a dict.

import re
terms = (["runtime", re.compile("runtime\s?:\s?(\d+)")],
         ["ctime", re.compile("ctime\s?:\s?(\d+)")],
         ["scan", re.compile("scan\s?:\s?(\d+)")])
results = [{property: pattern.findall(entry) for property, pattern in terms} for entry in entries]

now you have something like:

[{"runtime": None, "scan": ###, "ctime": ###}, {"runtime": ###, "scan": ###, "ctime": ###}, {"runtime": ###, "scan": None, "ctime": None}, ...]

The above code is equivalent (but more performant) to:

results = []
for entry in entries:
    entry_dict = {}
    for term, regex_pattern in terms:
        entry_dict[term] = regex_pattern.findall(entry) or None
    results.append(entry_dict)

Upvotes: 0

Related Questions