Reputation: 179
I'm trying to extract the first ISS TLE (Two Line Element set) from this website.
I need the first three lines following the:
TWO LINE MEAN ELEMENT SET
text: (ISS line, 1 line, 2 line).
So I get the text that has what I want using beautiful soup, but then I don't really know how to extract those lines of text. I can't use split()
because I need to exactly maintain the white space in those three lines. How can this be done?
import urllib2
from bs4 import BeautifulSoup
import ephem
import datetime
nasaissurl = 'http://spaceflight.nasa.gov/realdata/sightings/SSapplications/Post/JavaSSOP/orbit/ISS/SVPOST.html'
soup = BeautifulSoup(urllib2.urlopen(nasaissurl), 'html.parser')
body = soup.find_all("pre")
index = 0
firstTLE = False
for tag in body:
if "ISS" in tag.text:
print tag.text
Upvotes: 2
Views: 950
Reputation: 22440
You can achieve the same in several ways. Here is another approach:
from bs4 import BeautifulSoup
import requests
URL = "https://spaceflight.nasa.gov/realdata/sightings/SSapplications/Post/JavaSSOP/orbit/ISS/SVPOST.html"
soup = BeautifulSoup(requests.get(URL).text,"lxml")
for item in soup.select("pre"):
for line in range(len(item.text.splitlines())):
if "25544U" in item.text.splitlines()[line]:
doc = item.text.splitlines()[line-1].strip()
doc1 = item.text.splitlines()[line].strip()
doc2 = item.text.splitlines()[line+1].strip()
print("{}\n{}\n{}\n".format(doc,doc1,doc2))
Partial output:
ISS
1 25544U 98067A 18054.51611082 .00016717 00000-0 10270-3 0 9009
2 25544 51.6368 225.3935 0003190 125.8429 234.3021 15.54140528 20837
ISS
1 25544U 98067A 18055.54493747 .00016717 00000-0 10270-3 0 9010
2 25544 51.6354 220.2641 0003197 130.5210 229.6221 15.54104949 20991
ISS
1 25544U 98067A 18056.50945749 .00016717 00000-0 10270-3 0 9022
2 25544 51.6372 215.4558 0003149 134.4837 225.6573 15.54146916 21143
Upvotes: 1
Reputation: 49774
If you break the text into lines and process each line at a time, then you can rejoin the lines when you find the three lines you need like:
def process_tag_text(tag_text):
marker = 'TWO LINE MEAN ELEMENT SET'
text = iter(tag_text.split('\n'))
for line in text:
if marker in line:
next(text)
results.append('\n'.join(
(next(text), next(text), next(text))))
return results
import urllib2
from bs4 import BeautifulSoup
nasaissurl = 'http://spaceflight.nasa.gov/realdata/sightings/' \
'SSapplications/Post/JavaSSOP/orbit/ISS/SVPOST.html'
soup = BeautifulSoup(urllib2.urlopen(nasaissurl), 'html.parser')
body = soup.find_all("pre")
results = []
for tag in body:
if "ISS" in tag.text:
results.extend(process_tag_text(tag.text))
print('\n'.join(results))
ISS
1 25544U 98067A 18054.51611082 .00016717 00000-0 10270-3 0 9009
2 25544 51.6368 225.3935 0003190 125.8429 234.3021 15.54140528 20837
ISS
1 25544U 98067A 18055.54493747 .00016717 00000-0 10270-3 0 9010
2 25544 51.6354 220.2641 0003197 130.5210 229.6221 15.54104949 20991
ISS
1 25544U 98067A 18056.50945749 .00016717 00000-0 10270-3 0 9022
2 25544 51.6372 215.4558 0003149 134.4837 225.6573 15.54146916 21143
ISS
1 25544U 98067A 18057.34537198 .00016717 00000-0 10270-3 0 9031
2 25544 51.6399 211.2932 0002593 130.2258 229.9121 15.54133048 21277
Upvotes: 1