Reputation: 19
from urllib import parse
import scrapy
from scrapy.linkextractors import LinkExtractor
import codecs
import json
class WanikaniSpider(scrapy.Spider):
name = 'japandict'
allowed_domains = ['www.wanikani.com']
url = ('https://www.wanikani.com/kanji/')
start_urls = []
kanjis = ['悪' ,'安' ,'以' ,'飲' ,'意' ,'医' ,'員' ,'運' ,'映' ,'英' ,'屋' ,'駅' ,'歌' ,'牛']
liste=[]
for kanji in kanjis:
liste.append(kanji)
nurl = url + kanji
start_urls.append(nurl)
file = open("kanji.txt","a",encoding="utf-8")
file1 = open("onyomi.txt","a",encoding="utf-8")
file2 = open("kunyomi.txt","a",encoding="utf-8")
file3 = open("meanings.txt","a",encoding="utf-8")
def parse(self, response):
print(response.url)
kanjiicon = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/span/text()').getall()
meanings = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/text()').getall()
reading = response.xpath('//*[@id="reading"]/div')
for onkun in reading:
onyomi= onkun.xpath('//*[@id="reading"]/div/div[1]/p/text()').getall()
kunyomi= onkun.xpath('//*[@id="reading"]/div/div[2]/p/text()').getall()
for x in onyomi:
x.strip()
self.file1.write(x + "\n")
self.file1.close
for y in kanjiicon:
self.file.write(y + "\n")
self.file.close
for z in kunyomi:
self.file2.write(z + "\n")
self.file.close
for p in meanings:
self.file3.write(p + "\n")
self.file.close
Kanji is Japanese character which has a onyomi and kunyomi readings. I wanna get this readings and meaning of kanji and write on text file. So there is website I can do this. Its creating txt file but its empty.
Upvotes: 1
Views: 66
Reputation: 17291
I see a few issues with your code. I am not certain if this is all that is needed to make your project work but one main issue is with how you are opening and closing the files. Right now you open them in your Class definition and then close them with each and every request. Which means that after the very first time parse
is called your files have already been closed and are no longer writable. What you should do is use scrapy item pipelines for directing output and writing data to files. for example:
in your spider file:
import scrapy
class WanikaniSpider(scrapy.Spider):
name = 'japandict'
allowed_domains = ['www.wanikani.com']
url = ('https://www.wanikani.com/kanji/')
start_urls = []
kanjis = ['悪' ,'安' ,'以' ,'飲' ,'意' ,'医' ,'員' ,'運' ,'映' ,'英' ,'屋' ,'駅' ,'歌' ,'牛']
liste=[]
for kanji in kanjis:
liste.append(kanji)
nurl = url + kanji
start_urls.append(nurl)
def parse(self, response):
kanjiicon = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/span/text()').getall()
meanings = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/text()').getall()
for y in kanjiicon:
yield {"kanji": y.strip()}
for p in meanings:
yield {"meanings": p.strip()}
reading = response.xpath('//*[@id="reading"]/div')
for onkun in reading:
onyomi= onkun.xpath('//*[@id="reading"]/div/div[1]/p/text()').getall()
kunyomi= onkun.xpath('//*[@id="reading"]/div/div[2]/p/text()').getall()
for x in onyomi:
yield {"onyomi": x.strip()}
for z in kunyomi:
yield {"kunyomi": z.strip()}
then in your pipelines.py file
class SpidersPipeline:
def process_item(self, item, spider):
for i, kw in enumerate(["kanji","onyomi","kunyomi","meanings"]):
if kw in item:
self.files[i].write(item[kw] + "\n")
def open_spider(self, spider):
self.files = [open(x, "a", encoding="utf-8") for x in [
"kanji.txt", "onyomi.txt", "kunyomi.txt",
"meanings.txt"]]
def close_spider(self, spider):
list(map(lambda x: x.close(), self.files))
and remember to uncomment the pipelines in the settings.py
file
ITEM_PIPELINES = {
'spiders.pipelines.SpidersPipeline': 300, # <- make sure it is uncommented
}
Upvotes: 2