Smith Lo
Smith Lo

Reputation: 315

Copy line with search value from another file with Python

I want to use Python to:

  1. Read a line from search_list file.
  2. Iterate through source_file lines.
  3. If a line match is found, copy that whole line from source_file to export_file.
  4. Repeat steps 1-3 until search_file is exhausted.

The contents of source_file are plain text. Sample:

Act of Heroism  Instant 1W  Common  Magali Villeneuve
Adorned Pouncer Creature — Cat 1/1  1W  Rare    Slawomir Maniak
Angel of Condemnation   Creature — Angel 3/3    2WW Rare    Slawomir Maniak

The search_list file is also a plain text with keywords, one per line, as in the following example:

Condemnation
Heroism

After spending some time in Stackoverflow, I have the current code --which is unusable at the moment:

with open('list.txt', 'r') as search_list, \
        open('source_file.txt', 'r', encoding="utf8") as source_file:

    for line in search_list:
        searchquery = search_list.readlines()

        for line in source_file:
            current_line = line.split()

            if searchquery in current_line:
                print (line)

it returns nothing.

I try to figure it out what's wrong and so far I can't find it.

I did a step back and tried to search with string and it worked!

with open('list.txt', 'r') as search_list, \
        open('source_file.txt', 'r', encoding="utf8") as source_file:

    for line in source_file:        
        if "Heroism" in line:
            print (line)

The result is:

Act of Heroism  Instant 1W  Common  Magali Villeneuve

Could anyone point me out what's wrong in my top code?

Thank you very much.

Upvotes: 0

Views: 2685

Answers (1)

Izaak van Dongen
Izaak van Dongen

Reputation: 2545

I interpreted your question as that you want to output each line of a file source_file.txt that contains a certain substring, and these substrings are in another file search_list.txt. If that is correct, the following code should work for you:

import sys

with open('search_list.txt', 'r') as search_list:
    targets = [line.strip() for line in search_list]

with open('source_file.txt', 'r') as source_file:
    for line in source_file:
        if any(target in line for target in targets):
            sys.stdout.write(line)

where search_lines.txt is

Condemnation
Heroism

and source_file.txt is

Act of Heroism Instant 1W Common Magali Villeneuve
Adorned Pouncer Creature — Cat 1/1 1W Rare Slawomir Maniak
Angel of Condemnation Creature — Angel 3/3 2WW Rare Slawomir Maniak

this will correctly output

Act of Heroism Instant 1W Common Magali Villeneuve
Angel of Condemnation Creature — Angel 3/3 2WW Rare Slawomir Maniak

which is each line that contains either 'Condemnation' or 'Heroism'.

This works by first building up a list of all the targets first, and then for each line in source_file.txt, checking if any target is a substring of the line. You have to build up the list of targets as when you iterate over a file in Python each line is 'consumed' so you can't go back to the start again in another for loop.

The way the line if any(target in line for target in targets) works is broadly like this:

First, it uses the generator expression target in line for target in targets. This returns the value of target in line (which checks if target is a substring of line) for each target in targets - it could also effectively be written as

with open('source_file.txt', 'r') as source_file:
    for line in source_file:
        matches = []
        for target in targets:
            matches.append(target in line)
        if any(matches):
            sys.stdout.write(line)

Now, the any function takes an iterable (something like a list) and returns True if any of the values are True (or equivalent to True). It also has the property of short-circuiting - it actuallly stops as soon as it does meet True, if it does. This means the code could be rewritten pretty accuately as

with open('source_file.txt', 'r') as source_file:
    for line in source_file:
        matches = []
        for target in targets:
            if target in line:
                sys.stdout.write(line)
                break

(This has to do with the fact that there is a generator expression, which does not evaluate the whole thing at once, but lazily gives one value at a time, meaning no more work will be done than needed.)

By the way, [line.strip() for line in search_list] is a list comprehension. This returns a list of line.strip() for each line in search_list. This could be rewritten as

    targets = []
    for line in search_list:
        targets.append(line.strip()

Hopefully that's helped. Here is some useful documentation on how list comprehensions work. I find it can often be useful to start with the simpler examples like [i ** 2 for i in range(10)]. Let me know if you'd like any more clarification.

Upvotes: 3

Related Questions