Simply  Seth
Simply Seth

Reputation: 3496

python grep look for a pattern and then a number of lines before

I'm looking to do the equivalent of _grep -B14 MMA

I have a URL that I open and it spits out many lines. I want to

  1. find the line that has 'MMa'
  2. then print the 14th line before it

I don't even know where to begin with this.

import urllib
import urllib2

url = "https://longannoyingurl.com"

opts = {
  'action': 'Dump+It'
}
data = urllib.urlencode(opts)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
print  response.read() # gives the full html output

Upvotes: 3

Views: 5559

Answers (3)

Simply  Seth
Simply Seth

Reputation: 3496

thanks to Dan I got my result

import urllib
import urllib2 
import re

url="https://somelongannoyingurl/blah/servlet"
opts = {
  'authid': 'someID',
   'action': 'Dump+It'
}
data = urllib.urlencode(opts)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)

lines = response.readlines()
r = re.compile(r'MMa')
for i in range(len(lines)):
  if r.search(lines[i]):
    line = lines[max(0, i-14)].strip()
    junk,mma = line.split('>')
    print mma.strip()

~

Upvotes: 1

Dan Loewenherz
Dan Loewenherz

Reputation: 11236

Instead of just doing a bare read on the response object, call readlines instead, and then run a regular expression through each line. If the line matches, print the 14th line before it, but check to see that you're not negative indexing. E.g.

import re

lines = response.readlines()

r = re.compile(r'MMa')
for i in range(len(lines)):
    if r.search(lines[i]):
        print lines[max(0, i-14)]

Upvotes: 8

Adam Rosenfield
Adam Rosenfield

Reputation: 400314

You can split a single string into a list of lines using mystr.splitlines(). You can test if a string matches a regular expression using re.match(). Once you find the matching line(s), you can index backwards into your list of lines to find the 14th line before.

Upvotes: 0

Related Questions