Jill
Jill

Reputation: 1

Reading specific words from an online source - Python

Here's the content of the text file abc.txt

This is before the start and should be ignored.
So should this
and this


*** START OF SYNTHETIC TEST CASE ***
a ba bac
*** END OF SYNTHETIC TEST CASE ***

This is after the end and should be ignored too.
Have a nice day.

I need to write a function, get_words_from_file(filename), that returns a list of lower case words as shown in the sample case below. The function should only process lines between the start and end marker lines and use the definition of words provided below.

I am provided with the following regular expression that describes what is required. I am not expected to understand how regular expressions work, I just need to understand that the call to findall given below will return a list of the relevant words from a given line string.

words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", line)

.Include all lower-case character sequences including those that contain a 
- or ' character and those that end with a ' character. 
.Words that end with a - MUST NOT be included. 
.The words should be in the same order as they occur in the file.
.There must be no more than 9 CONSTANTS declared.
.Functions must be no longer than 20 statements.
.Functions must not have more than 3 parameters.

Test Code:

filename = "abc.txt"
words2 = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(words2)))
print("Valid word list:")
print("\n".join(words2))

Expected Output:

abc.txt loaded ok.
3 valid words found.
Valid word list:
a
ba
bac

My Code is as follows:

def stripped_lines(lines):
    for line in lines:
        stripped_line = line.rstrip('\n')
        yield stripped_line

def lines_from_file(fname):
    with open(fname, 'rt', encoding='utf8') as flines:
        for line in stripped_lines(flines):
            yield line

def is_marker_line(line, start='***', end='***'):
    '''
    Marker lines start and end with the given strings, which may not
    overlap.  (A line containing just '***' is not a valid marker line.)
    '''
    min_len = len(start) + len(end)
    if len(line) < min_len:
        return False
    return line.startswith(start) and line.endswith(end)

def advance_past_next_marker(lines):
    '''
    Advances the given iterator through the first encountered marker
    line, if any.
    '''
    for line in lines:
        if is_marker_line(line):
            break

def lines_before_next_marker(lines):
    '''
    Yields all lines up to but not including the next marker line.  If
    no marker line is found, yields no lines.
    '''
    valid_lines = []
    for line in lines:
        if is_marker_line(line):
            break
        valid_lines.append(line)
    else:
        # `for` loop did not break, meaning there was no marker line.
        valid_lines = []
    for content_line in valid_lines:
        yield content_line

def lines_between_markers(lines):
    '''
    Yields the lines between the first two marker lines.
    '''
    # Must use the iterator --- if it's merely an iterable (like a list
    # of strings), the call to lines_before_next_marker will restart
    # from the beginning.
    it = iter(lines)
    advance_past_next_marker(it)
    for line in lines_before_next_marker(it):
        yield line

def words(lines):
    text = '\n'.join(lines).lower().split()
    # Same as before...

def get_words_from_file(fname):
    for word in words(lines_between_markers(lines_from_file(fname))):
        return word

filename = "abc.txt"
words2 = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(words2)))
print("Valid word list:")
print("\n".join(words2))

My Crappy Output

 Traceback (most recent call last):
  File "C:/Users/Jill/SQ4.1(2).py", line 67, in <module>
    words2 = get_words_from_file(filename)
  File "C:/Users/Jason/SQ4.1(2).py", line 63, in <module>
    for word in words(lines_between_markers(lines_from_file(fname))):
builtins.TypeError: 'NoneType' object is not iterable

Could you help me with the correcting my code? I am at a total loss.

Upvotes: 0

Views: 55

Answers (1)

Reddysekhar Gaduputi
Reddysekhar Gaduputi

Reputation: 490

I have changed the original code a bit, try below.

def stripped_lines(lines):
for line in lines:
    stripped_line = line.rstrip('\n')
    yield stripped_line


def lines_from_file(fname):
    with open(fname, 'rt') as flines:
        for line in stripped_lines(flines):
            yield line


def is_marker_line(line, start='***', end='***'):
    '''
    Marker lines start and end with the given strings, which may not
    overlap.  (A line containing just '***' is not a valid marker line.)
    '''
    min_len = len(start) + len(end)
    if len(line) < min_len:
        return False
    return line.startswith(start) and line.endswith(end)


def advance_past_next_marker(lines):
    '''
    Advances the given iterator through the first encountered marker
    line, if any.
    '''
    for line in lines:
        if is_marker_line(line):
            break


def lines_before_next_marker(lines):
    '''
    Yields all lines up to but not including the next marker line.  If
    no marker line is found, yields no lines.
    '''
    valid_lines = []
    for line in lines:
        if is_marker_line(line):
            break
        valid_lines.append(line)
    else:
        # `for` loop did not break, meaning there was no marker line.
        valid_lines = []
    for content_line in valid_lines:
        yield content_line


def lines_between_markers(lines):
    '''
    Yields the lines between the first two marker lines.
    '''
    # Must use the iterator --- if it's merely an iterable (like a list
    # of strings), the call to lines_before_next_marker will restart
    # from the beginning.
    it = iter(lines)
    advance_past_next_marker(it)
    for line in lines_before_next_marker(it):
        yield line


def words(lines):
    text = '\n'.join(lines).lower().split()
    return text

def get_words_from_file(fname):
    return words(lines_between_markers(lines_from_file(fname)))

filename = "abc.txt"
all_words = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(all_words)))
print("Valid word list:")
print("\n".join(all_words))

Output will be below,

('abc.txt', 'loaded ok.')
3 valid words found.
Valid word list:
a
ba
bac

Upvotes: 1

Related Questions