Reputation: 145
I'm using scrapy to pull data from a website. The unadulterated version comes like this:
{eps: 25}
{eps:[]}
{eps:[]}
{eps:[]}
{eps: 50}
{eps:[]}
{eps:[]}
{eps:[]}
Now I am not sure why the blank ones show up, but I am able to remove them with .replace. The issue is when I use .replace
the result is like this:
25
50
# Code comment to show extra spaces.
I've tried .split
, .sub
, .strip
to no avail. I'm not sure what else to try.
UPDATE:
Adding source code
# coding: utf-8
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import CsvItemExporter
import re
import csv
import urlparse
from stockscrape.items import EPSItem
class epsScrape(BaseSpider):
name = "eps"
allowed_domains = ["investors.com"]
ifile = open('test.txt', "r")
reader = csv.reader(ifile)
start_urls = []
for row in ifile:
url = row.replace("\n","")
if url == "symbol":
continue
else:
start_urls.append("http://research.investors.com/quotes/nyse-" + url + ".htm")
ifile.close()
def parse(self, response):
f = open("eps.txt", "a+")
sel = HtmlXPathSelector(response)
sites = sel.select("//tbody/tr")
items = []
for site in sites:
item = EPSItem()
item['eps'] = site.select("td[contains(@class, 'rating')]/span/text()").extract()
strItem = str(item)
newItem = strItem.replace(" ","").replace("'","").replace("{eps:[","").replace("]}","").replace("u","").replace("\\r\\n",'').replace('$
f.write("%s\n" % newItem)
f.close()
text.txt has a stock symbols in it like this:
MSFT
A
H
so on and so forth
Upvotes: 1
Views: 1190
Reputation: 1122382
Empty lines contain newlines; replace the \n
too.
If you find that you are end up removing all newlines, then split on newlines and remove any empty string values:
outputstring = '\n'.join([line for line in inputstring.splitlines() if line.strip()])
This removes any empty lines, rejoining the remaining non-empty lines with fresh newlines.
If instead you are producing the output line by line by printing or writing to a file, simply not print or write when the line is empty:
newItem = newItem.replace(.., ..)
if newItem.strip():
print newItem
f.write('{}\n'.format(newItem))
The if
statement tests for a line that contains more than just whitespace.
Upvotes: 6