the_real_one
the_real_one

Reputation: 467

Maps and List Comprehension in Python

So I am fairly new to the concept of list comprehension and map/filter/reduce but feel like this can be done with few lines and less identenation:

ip_tlsv1_counts = {}

for filename in os.listdir(directory_path):
    if filename.endswith(LOG_FILE_EXTENSION):
        with open(os.path.join(directory_path, filename)) as file_handle:
            for line_contents in file_handle:
                line_groups = re.search(LOG_LINE_REGEX, line_contents)
                if line_groups and line_groups.group(8) == "TLSv1":
                    if not line_groups.group(2) in ip_tlsv1_counts:
                        ip_tlsv1_counts[line_groups.group(2)] = 1
                    else:
                        ip_tlsv1_counts[line_groups.group(2)] += 1

return ip_tlsv1_counts

Upvotes: 0

Views: 99

Answers (2)

Tadhg McDonald-Jensen
Tadhg McDonald-Jensen

Reputation: 21453

I'd use one generator function to iterate over all the files with a particular extension in a particular folder:

def filter_files(directory, extension):
    for filename in os.listdir(directory):
        if filename.endswith(extension):
            with open(os.path.join(directory, filename)) as file_handle:
                yield file_handle

Then to iterate over all the lines of all those files you'd just use itertools.chain.from_iterable on the result of that generator.

Next you can use re.compile(LOG_LINES_REGEX) to get a compiled pattern, this gives some preformance boost as well as being able to use it's .search method in map:

log_line_re = re.compile(LOG_LINE_REGEX)
all_log_lines = itertools.chain.from_iterable(filter_files(directory_path, LOG_FILE_EXTENSION))

for line_groups in map(log_line_re.search, all_log_lines):
    if line_groups and line_groups.group(8) == "TLSv1":
        yield line_groups.group(2)

This will be a generator that produces all of the line_groups.group(2) that match the other conditions, so to count all the frequencies, you'd just construct a Counter with it's result.

So the final code would be like this:

def filter_files(directory, extension):
    for filename in os.listdir(directory):
        if filename.endswith(extension):
            with open(os.path.join(directory, filename)) as file_handle:
                yield file_handle

def get_part_of_log_files():
    log_line_re = re.compile(LOG_LINE_REGEX)
    all_log_lines = itertools.chain.from_iterable(filter_files(directory_path, LOG_FILE_EXTENSION))

    for line_groups in map(log_line_re.search, all_log_lines):
        if line_groups and line_groups.group(8) == "TLSv1":
            yield line_groups.group(2)

def original_function():
    return collections.Counter(get_part_of_log_files())

Upvotes: 0

newtover
newtover

Reputation: 32094

If you work with python 3.4+, you can use pathlib module:

from pathlib import Path
from collections import Counter

ip_tlsv1_counts = Counter()

for path in Path(directory_path).glob('*' + LOG_FILE_EXTENSION):
    with path.open() as f1:
        for line in f1:
            line_groups = re.search(LOG_LINE_REGEX, line)
            if line_groups and line_groups.group(8) == "TLSv1":
                ip_tlsv1_counts[line_groups.group(2)] += 1

return ip_tlsv1_counts

Upvotes: 1

Related Questions