iNoob
iNoob

Reputation: 1395

Manipulating Python dictionaries to remove empty values

I'm trying to remove a key/value pair if the key contains 'empty' values.

I have tried the following dictionary comprehension and tried doing it in long form, but it doesn't seem to actually do anything and I get no errors.

def get_Otherfiles():
    regs = ["(.*)((U|u)ser(.*))(\s=\s\W\w+\W)", "(.*)((U|u)ser(.*))(\s=\s\w+)", "(.*)((P|p)ass(.*))\s=\s(\W(.*)\W)", "(.*)((P|p)ass(.*))(\s=\s\W\w+\W)"]
    combined = "(" + ")|(".join(regs) + ")"
    cred_results = []
    creds = []
    un_matched = []
    filesfound = []
    d = {}
    for root, dirs, files in os.walk(dir):
        for filename in files:
            if filename.endswith(('.bat', '.vbs', '.ps', '.txt')):
                readfile = open(os.path.join(root, filename), "r")
                d.setdefault(filename, [])
                for line in readfile:
                    m = re.match(combined, line)
                    if m:
                        d[filename].append(m.group(0).rstrip())
                    else:
                        pass
    result = d.copy()
    result.update((k, v) for k, v in d.iteritems() if v is not None)
    print result

Current output:

{'debug.txt': [], 'logonscript1.vbs': ['strUser = "guytom"', 'strPassword = "P@ssw0rd1"'], 'logonscript2.bat': ['strUsername = "guytom2"', 'strPass = "SECRETPASSWORD"']}

As you can see I have entries with empty values. I'd like to remove these before printing the data.

Upvotes: 0

Views: 99

Answers (3)

chepner
chepner

Reputation: 530783

In this part of your code:

            d.setdefault(filename, [])
            for line in readfile:
                m = re.match(combined, line)
                if m:
                    d[filename].append(m.group(0).rstrip())
                else:
                    pass

You always add filename as a key to the dictionary, even if you don't subsequently add anything to the resulting list. Try

            for line in read file:
                m = re.match(combined, line)
                if m:
                    d.setdefault(filename, []).append(m.group(0).rstrip())

which will only initialize d[filename] to an empty list if it is actually necessary to have something on which to call append.

Upvotes: 1

tdelaney
tdelaney

Reputation: 77337

Looking at the first matching group in your regex, (.*), if the regex matches but there are no characters to match, group(0) is "", not None. So, you can filter there.

result.update((k, v) for k, v in d.iteritems() if not v)

But you can also have your regex do that part for you. Change that first group to (.+) and you won't have empty values to filter out.

EDIT

Instead of removing empty values at the end, you can avoid adding them to the dict altogether.

def get_Otherfiles():
    # fixes: make it a raw string so that \s works right and
    # tighten up filtering, ... (U|u) should probably be [Uu] ...
    regs = ["(.+)\s*((U|u)ser(.*))(\s=\s\W\w+\W)", "(.*)((U|u)ser(.*))(\s=\s\w+)", "(.*)((P|p)ass(.*))\s=\s(\W(.*)\W)", "(.*)((P|p)ass(.*))(\s=\s\W\w+\W)"]
    combined = "(" + ")|(".join(regs) + ")"
    cred_results = []
    creds = []
    un_matched = []
    filesfound = []
    d = {}
    for root, dirs, files in os.walk(dir):
        for filename in files:
            if filename.endswith(('.bat', '.vbs', '.ps', '.txt')):
                readfile = open(os.path.join(root, filename), "r")
                # assuming you want to aggregate matching file names...
                content_list = d.get(filename, [])
                content_orig_len = len(content_list)
                for line in readfile:
                    m = re.match(combined, line)
                    if m:
                        content_list.append(m.group(0))
                if len(content_list) > content_orig_len:
                    d[filename] = content_list

Upvotes: 0

Joran Beasley
Joran Beasley

Reputation: 113930

result = dict((k, v) for k, v in d.iteritems() if v is not None)

update wont remove entries ... it will only add or change

a = {"1":2}
a.update({"2":7})
print a # contains both "1" and "2" keys

Upvotes: 0

Related Questions