Ethan
Ethan

Reputation: 1266

Python regular expression issues

So I'm trying to get better at python in general but I'm having some trouble using the re module for regular expressions.

I have a comma separated csv file that I'm reading in, and then I want to find all occurrences of a line ending in a comma 5. So I used the code below:

    five_rating = re.compile(r",5$", re.MULTILINE)
    print five_rating.findall(file.read())

but I don't get any output. There are definitely occurrences that match the regular expression I'm using, I've tested my regex on python regex websites and they model what I want, but in code, it just doesn't work!

Is there something obvious I'm doing wrong here?

Oh and I'm using Ubuntu and the file should have DOS style line endings, but I tried converting the end-line characters using the code from this post and it didn't do the trick.

btw here's a sample of the input:

9605,Ace Ventura: Pet Detective,5
9606,Ace Ventura: Pet Detective,1
9607,Ace Ventura: Pet Detective,4
9608,Ace Ventura: Pet Detective,3
9609,Ace Ventura: Pet Detective,2
9610,Ace Ventura: Pet Detective,4
9611,Ace Ventura: Pet Detective,3
9612,Ace Ventura: Pet Detective,4
9613,Ace Ventura: Pet Detective,5
9614,Ace Ventura: Pet Detective,5
9615,Ace Ventura: Pet Detective,4
9616,Ace Ventura: Pet Detective,1
9617,Ace Ventura: Pet Detective,3
9618,Ace Ventura: Pet Detective,4
9619,Ace Ventura: Pet Detective,3
9620,Ace Ventura: Pet Detective,1
9621,Ace Ventura: Pet Detective,2
9622,Ace Ventura: Pet Detective,3
9623,Ace Ventura: Pet Detective,5
9624,Ace Ventura: Pet Detective,2
9625,Ace Ventura: Pet Detective,2
9626,Ace Ventura: Pet Detective,4
9627,Ace Ventura: Pet Detective,3
9628,Ace Ventura: Pet Detective,1

Upvotes: 2

Views: 146

Answers (2)

dawg
dawg

Reputation: 103754

Given you input (which could be a file) as a multiline string, like this:

st='''9605,Ace Ventura: Pet Detective,5
9606,Ace Ventura: Pet Detective,1
9607,Ace Ventura: Pet Detective,4
9608,Ace Ventura: Pet Detective,3
9609,Ace Ventura: Pet Detective,2
9610,Ace Ventura: Pet Detective,4
9611,Ace Ventura: Pet Detective,3
9612,Ace Ventura: Pet Detective,4
9613,Ace Ventura: Pet Detective,5
9614,Ace Ventura: Pet Detective,5
9615,Ace Ventura: Pet Detective,4
9616,Ace Ventura: Pet Detective,1
9617,Ace Ventura: Pet Detective,3
9618,Ace Ventura: Pet Detective,4
9619,Ace Ventura: Pet Detective,3
9620,Ace Ventura: Pet Detective,1
9621,Ace Ventura: Pet Detective,2
9622,Ace Ventura: Pet Detective,3
9623,Ace Ventura: Pet Detective,5
9624,Ace Ventura: Pet Detective,2
9625,Ace Ventura: Pet Detective,2
9626,Ace Ventura: Pet Detective,4
9627,Ace Ventura: Pet Detective,3
9628,Ace Ventura: Pet Detective,1'''

This works:

import re

for line in st.splitlines():
    m=re.search(r'(^.*,5$)',line)
    if m: print m.group(0) 

or a re.findall version:

print re.findall(r'(^.*,5$)',st, re.MULTILINE)

or (somewhat confusingly IMHO) re.findall will work without parens:

print re.findall(r'^.*,5$',st, re.MULTILINE)

Yours is not working because of no .* meaning 'match everything up to the ',5$'

Also as stated in one of the comments, using file as a identifier is a bad idea.

You can also use Python's string processing to do this:

for line in st.splitlines():
    if line.endswith(',5'): print line

And if you really have a CSV file to process -- use the builtin CSV module.


Finally -- if you have a DOS file on *nix, just use Python's universal line support by using open with 'U' in it:

with open(...,'rU') as infile:

Upvotes: 1

Nicolas
Nicolas

Reputation: 5668

Note that you don't really need a regex here:

with open('file') as f:
    lines = [l.strip() for l in f.readlines() if l.strip().endswith(',5')]

print(list(lines))
>>> ['9605,Ace Ventura: Pet Detective,5', '9613,Ace Ventura: Pet Detective,5', '9614,Ace Ventura: Pet Detective,5', '9623,Ace Ventura: Pet Detective,5']

Upvotes: 1

Related Questions