Reputation: 47
I have a question w.r.t extracting a string, with varied len, from individual line breaks that are only demarcated by '|' and spaces. Take a look at the following link
http://ftp.nasdaqtrader.com/dynamic/SymDir/nasdaqlisted.txt
I am trying to extract all the company symbols under the first column of the above link. However, I cant think of a logic loop that will do that and store it in a way that is easy for extraction in the future.
I was hoping any pr0s may have an opinion!
EDIT:
Hi I understand some of your reservations. I would be very satisfied with how to think about the solution logically.
Upvotes: 1
Views: 82
Reputation: 72
Have a look at the python csv module:
import csv
with open('nasdaqlisted.txt', 'r') as csvFile:
reader = csv.reader(csvFile, delimiter='|')
for row in reader:
print(row[0])
csvFile.close()
You just need to change the delimiter to '|'
and it works out of the box.
Upvotes: 0
Reputation: 813
I hope this helps your case where you directly scrape data off the text page:
import requests
response = requests.get('http://ftp.nasdaqtrader.com/dynamic/SymDir/nasdaqlisted.txt')
document = response.text.splitlines()
for line in document[1:-1]: #This helps you skip unnecessary lines
data = line.split('|')
symbol = data[0]
print(symbol)
You can skip the first and last line of the document
since they are not associated with the symbols
you are looking for. Also, splitlines
creates a list of lines for you and you can use list index to skip the first and last lines.
Upvotes: 1