SIM
SIM

Reputation: 22440

Can't refrain my script from grabbing unnecessary lines

I've written a script in python to get certain from a text container. I used re module to do the job. However, it is giving me unnecesary output along with the required ones.

How can I modify my expression to be stick to the lines I wanna grab?

This is my try:

import re

content = """
A Gross exaggeration,
-- Gross   5 90,630,08,
Gross      4 13,360,023,
      Gross      2 70,940,02,
Luke gross is an actor
"""
for item in re.finditer(r'Gross(?:[\d\s,]*)',content):
    print(item.group().strip())

Output I'm having:

Gross
Gross   5 90,630,08,
Gross      4 13,360,023,
Gross      2 70,940,02,

Output I wish to have:

Gross      4 13,360,023
Gross      2 70,940,02

Upvotes: 0

Views: 22

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195573

I changed the regex string to r'(?:^\s*?)Gross[\d\s,]*?(?=,$)' and added multiline flag (online regex here):

import re

content = """
A Gross exaggeration,
-- Gross   5 90,630,08,
Gross      4 13,360,023,
      Gross      2 70,940,02,
Luke gross is an actor
"""

for item in re.finditer(r'(?:^\s*?)Gross[\d\s,]*?(?=,$)',content, flags=re.M):
    print(item.group().strip())

Output is:

Gross      4 13,360,023
Gross      2 70,940,02

Upvotes: 1

emsimpson92
emsimpson92

Reputation: 1778

^\s*Gross[\d ,]*(?=,) Will capture what you want.

I just tacked on ^ to signal the start of the line, used \s* to indicate optional whitespace before "gross" and trimmed the , from the end. I also removed your \s from your character class because it worked with new lines. I replaced it with a blank space.

Demo

Upvotes: 0

Related Questions