Reputation: 467
So I am fairly new to the concept of list comprehension and map/filter/reduce but feel like this can be done with few lines and less identenation:
ip_tlsv1_counts = {}
for filename in os.listdir(directory_path):
if filename.endswith(LOG_FILE_EXTENSION):
with open(os.path.join(directory_path, filename)) as file_handle:
for line_contents in file_handle:
line_groups = re.search(LOG_LINE_REGEX, line_contents)
if line_groups and line_groups.group(8) == "TLSv1":
if not line_groups.group(2) in ip_tlsv1_counts:
ip_tlsv1_counts[line_groups.group(2)] = 1
else:
ip_tlsv1_counts[line_groups.group(2)] += 1
return ip_tlsv1_counts
Upvotes: 0
Views: 99
Reputation: 21453
I'd use one generator function to iterate over all the files with a particular extension in a particular folder:
def filter_files(directory, extension):
for filename in os.listdir(directory):
if filename.endswith(extension):
with open(os.path.join(directory, filename)) as file_handle:
yield file_handle
Then to iterate over all the lines of all those files you'd just use itertools.chain.from_iterable
on the result of that generator.
Next you can use re.compile(LOG_LINES_REGEX)
to get a compiled pattern, this gives some preformance boost as well as being able to use it's .search
method in map
:
log_line_re = re.compile(LOG_LINE_REGEX)
all_log_lines = itertools.chain.from_iterable(filter_files(directory_path, LOG_FILE_EXTENSION))
for line_groups in map(log_line_re.search, all_log_lines):
if line_groups and line_groups.group(8) == "TLSv1":
yield line_groups.group(2)
This will be a generator that produces all of the line_groups.group(2)
that match the other conditions, so to count all the frequencies, you'd just construct a Counter
with it's result.
So the final code would be like this:
def filter_files(directory, extension):
for filename in os.listdir(directory):
if filename.endswith(extension):
with open(os.path.join(directory, filename)) as file_handle:
yield file_handle
def get_part_of_log_files():
log_line_re = re.compile(LOG_LINE_REGEX)
all_log_lines = itertools.chain.from_iterable(filter_files(directory_path, LOG_FILE_EXTENSION))
for line_groups in map(log_line_re.search, all_log_lines):
if line_groups and line_groups.group(8) == "TLSv1":
yield line_groups.group(2)
def original_function():
return collections.Counter(get_part_of_log_files())
Upvotes: 0
Reputation: 32094
If you work with python 3.4+, you can use pathlib module:
from pathlib import Path
from collections import Counter
ip_tlsv1_counts = Counter()
for path in Path(directory_path).glob('*' + LOG_FILE_EXTENSION):
with path.open() as f1:
for line in f1:
line_groups = re.search(LOG_LINE_REGEX, line)
if line_groups and line_groups.group(8) == "TLSv1":
ip_tlsv1_counts[line_groups.group(2)] += 1
return ip_tlsv1_counts
Upvotes: 1