physlexic
physlexic

Reputation: 858

Can't replace string with variable

I came up with the below which finds a string in a row and copies that row to a new file. I want to replace Foo23 with something more dynamic (i.e. [0-9], etc.), but I cannot get this, or variables or regex, to work. It doesn't fail, but I also get no results. Help? Thanks.

with open('C:/path/to/file/input.csv') as f:
    with open('C:/path/to/file/output.csv', "w") as f1:
        for line in f:
            if "Foo23" in line:
                f1.write(line)

Upvotes: 0

Views: 145

Answers (2)

Engineero
Engineero

Reputation: 12908

Based on your comment, you want to match lines whenever any three letters followed by two numbers are present, e.g. foo12 and bar54. Use regex!

import re
pattern = r'([a-zA-Z]{3}\d{2})\b'
for line in f:
    if re.findall(pattern, line):
        f1.write(line)

This will match lines like 'some line foo12' and 'another foo54 line', but not 'a third line foo' or 'something bar123'.

Breaking it down:

pattern = r'(                  # start capture group, not needed here, but nice if you want the actual match back
             [a-zA-Z]{3}       # any three letters in a row, any case
                        \d{2}  # any two digits
            )                  # end capture group
            \b                 # any word break (white space or end of line)
           '

If all you really need is to write all of the matches in the file to f1, you can use:

matches = re.findall(pattern, f.read())  # finds all matches in f
f1.write('\n'.join(matches))  # writes each match to a new line in f1

Upvotes: 1

Woody1193
Woody1193

Reputation: 7970

In essence, your question boils down to: "I want to determine whether the string matches pattern X, and if so, output it to the file". The best way to accomplish this is to use a reg-ex. In Python, the standard reg-ex library is re. So,

import re
matches = re.findall(r'([a-zA-Z]{3}\d{2})', line)

Combining this with file IO operations, we have:

data = []
with open('C:/path/to/file/input.csv', 'r') as f:
     data = list(f)

data = [ x for x in data if re.findall(r'([a-zA-Z]{3}\d{2})\b', line) ]
with open('C:/path/to/file/output.csv', 'w') as f1:
    for line in data:
        f1.write(line)

Notice that I split up your file IO operations to reduce nesting. I also removed the filtering outside of your IO. In general, each portion of your code should do "one thing" for ease of testing and maintenance.

Upvotes: 1

Related Questions