JosiP
JosiP

Reputation: 3079

'negative' pattern matching in python

I have the following input,

OK SYS 10 LEN 20 12 43
1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt
.

And I'd like to extract all of the input except the line containing "OK SYS 10 LEN 20" and the last line which contains a single "." (dot). That is, I want to extract the following

1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt.1234 /data/c13af4/f.txt

I tried the following,

for item in output:
    match_obj = re.search("^(?!OK) | ^(?!\\.)", item)
    if match_obj :
        print("got item " + item)

but it does not work, as it does not produce any output.

Upvotes: 41

Views: 194395

Answers (8)

Jochen Ritzel
Jochen Ritzel

Reputation: 107786

if not (line.startswith("OK ") or line.strip() == "."):
    print(line)

Upvotes: 6

Pablo Jomer
Pablo Jomer

Reputation: 10428

Why don't you match the OK SYS row and not return it.

for item in output:
    match_obj = re.search("(OK SYS|\\.).*", item)
    if not match_obj :
        print("got item " + item)

Upvotes: 4

Marcelo Cantos
Marcelo Cantos

Reputation: 186118

Use a negative match. (Also note that whitespace is significant, by default, inside a regex so don't space things out. Alternatively, use re.VERBOSE.)

for item in output:
    match_obj = re.search("^(OK|\\.)", item)
    if not match_obj:
        print("got item " + item)

Upvotes: 9

mmdemirbas
mmdemirbas

Reputation: 9168

See it in action:

match_obj = re.search("^(?!OK|\\.).*", item)

Don't forget to put .* after negative look-ahead, otherwise you couldn't get any match

Upvotes: 64

vt220
vt220

Reputation: 1

If the OK line is the first line and the last line is the dot you could consider slice them off like this:

TestString = '''OK SYS 10 LEN 20 12 43
1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt
.
'''
print('\n'.join(TestString.split()[1:-1]))

However if this is a very large string you may run into memory problems.

Upvotes: 0

Alex Misiulia
Alex Misiulia

Reputation: 1820

You can also do it without negative look ahead. You just need to add parentheses to that part of expression which you want to extract. This construction with parentheses is named group.

Let's write python code:

string = """OK SYS 10 LEN 20 12 43
1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt
.
"""

search_result = re.search(r"^OK.*\n((.|\s)*).", string)

if search_result:
    print(search_result.group(1))

Output is:

1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt

^OK.*\n will find first line with OK statement, but we don't want to extract it so leave it without parentheses. Next is part which we want to capture: ((.|\s)*), so put it inside parentheses. And in the end of regexp we look for a dot ., but we also don't want to capture it.

P.S: I find this answer is super helpful to understand power of groups. https://stackoverflow.com/a/3513858/4333811

Upvotes: 0

hkn06tr
hkn06tr

Reputation: 11

and(re.search("bla_bla_pattern", str_item, re.IGNORECASE) == None)

is working.

Upvotes: 1

Burhan Khalid
Burhan Khalid

Reputation: 174758

If this is a file, you can simply skip the first and last lines and read the rest with csv:

>>> s = """OK SYS 10 LEN 20 12 43
... 1233a.fdads.txt,23 /data/a11134/a.txt
... 3232b.ddsss.txt,32 /data/d13f11/b.txt
... 3452d.dsasa.txt,1234 /data/c13af4/f.txt
... ."""
>>> stream = StringIO.StringIO(s)
>>> rows = [row for row in csv.reader(stream,delimiter=',') if len(row) == 2]
>>> rows
[['1233a.fdads.txt', '23 /data/a11134/a.txt'], ['3232b.ddsss.txt', '32 /data/d13f11/b.txt'], ['3452d.dsasa.txt', '1234 /data/c13af4/f.txt']]

If its a file, then you can do this:

with open('myfile.txt','r') as f:
   rows = [row for row in csv.reader(f,delimiter=',') if len(row) == 2]

Upvotes: 1

Related Questions