Reputation: 3496
I'm looking to do the equivalent of _grep -B14 MMA
I have a URL that I open and it spits out many lines. I want to
I don't even know where to begin with this.
import urllib
import urllib2
url = "https://longannoyingurl.com"
opts = {
'action': 'Dump+It'
}
data = urllib.urlencode(opts)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
print response.read() # gives the full html output
Upvotes: 3
Views: 5559
Reputation: 3496
thanks to Dan I got my result
import urllib
import urllib2
import re
url="https://somelongannoyingurl/blah/servlet"
opts = {
'authid': 'someID',
'action': 'Dump+It'
}
data = urllib.urlencode(opts)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
lines = response.readlines()
r = re.compile(r'MMa')
for i in range(len(lines)):
if r.search(lines[i]):
line = lines[max(0, i-14)].strip()
junk,mma = line.split('>')
print mma.strip()
~
Upvotes: 1
Reputation: 11236
Instead of just doing a bare read
on the response object, call readlines
instead, and then run a regular expression through each line. If the line matches, print the 14th line before it, but check to see that you're not negative indexing. E.g.
import re
lines = response.readlines()
r = re.compile(r'MMa')
for i in range(len(lines)):
if r.search(lines[i]):
print lines[max(0, i-14)]
Upvotes: 8
Reputation: 400314
You can split a single string into a list of lines using mystr.splitlines()
. You can test if a string matches a regular expression using re.match()
. Once you find the matching line(s), you can index backwards into your list of lines to find the 14th line before.
Upvotes: 0