Lawlace
Lawlace

Reputation: 19

Python Scrapy I can't get any data

from urllib import parse
import scrapy
from scrapy.linkextractors import LinkExtractor
import codecs
import json

class WanikaniSpider(scrapy.Spider):
    name = 'japandict'
    allowed_domains = ['www.wanikani.com']         
    url = ('https://www.wanikani.com/kanji/')
    start_urls = []
    kanjis = ['悪' ,'安' ,'以' ,'飲' ,'意' ,'医' ,'員' ,'運' ,'映' ,'英' ,'屋' ,'駅' ,'歌' ,'牛']
    liste=[]
    for kanji in kanjis:
        liste.append(kanji)
        nurl = url + kanji
        start_urls.append(nurl)
    file =  open("kanji.txt","a",encoding="utf-8")
    file1 = open("onyomi.txt","a",encoding="utf-8")
    file2 = open("kunyomi.txt","a",encoding="utf-8") 
    file3 = open("meanings.txt","a",encoding="utf-8")       
           
           
    def parse(self, response):
        print(response.url)
        kanjiicon = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/span/text()').getall()
        meanings = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/text()').getall()
        reading = response.xpath('//*[@id="reading"]/div') 
        for onkun in reading:
            onyomi= onkun.xpath('//*[@id="reading"]/div/div[1]/p/text()').getall()
            kunyomi= onkun.xpath('//*[@id="reading"]/div/div[2]/p/text()').getall()                
        for x in onyomi:
            x.strip()
            self.file1.write(x + "\n")
            self.file1.close
        for y in kanjiicon:
            self.file.write(y + "\n")
            self.file.close
        for z in kunyomi:
            self.file2.write(z + "\n")
            self.file.close
        for p in meanings:
            self.file3.write(p + "\n")
            self.file.close

Kanji is Japanese character which has a onyomi and kunyomi readings. I wanna get this readings and meaning of kanji and write on text file. So there is website I can do this. Its creating txt file but its empty.

Upvotes: 1

Views: 66

Answers (1)

Alexander
Alexander

Reputation: 17291

I see a few issues with your code. I am not certain if this is all that is needed to make your project work but one main issue is with how you are opening and closing the files. Right now you open them in your Class definition and then close them with each and every request. Which means that after the very first time parse is called your files have already been closed and are no longer writable. What you should do is use scrapy item pipelines for directing output and writing data to files. for example:

in your spider file:

import scrapy

class WanikaniSpider(scrapy.Spider):
    name = 'japandict'
    allowed_domains = ['www.wanikani.com']
    url = ('https://www.wanikani.com/kanji/')
    start_urls = []
    kanjis = ['悪' ,'安' ,'以' ,'飲' ,'意' ,'医' ,'員' ,'運' ,'映' ,'英' ,'屋' ,'駅' ,'歌' ,'牛']
    liste=[]
    for kanji in kanjis:
        liste.append(kanji)
        nurl = url + kanji
        start_urls.append(nurl)

    def parse(self, response):
        kanjiicon = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/span/text()').getall()
        meanings = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/text()').getall()
        for y in kanjiicon:
            yield {"kanji": y.strip()}
        for p in meanings:
            yield {"meanings": p.strip()}
        reading = response.xpath('//*[@id="reading"]/div')
        for onkun in reading:
            onyomi= onkun.xpath('//*[@id="reading"]/div/div[1]/p/text()').getall()
            kunyomi= onkun.xpath('//*[@id="reading"]/div/div[2]/p/text()').getall()
            for x in onyomi:
                yield {"onyomi": x.strip()}
            for z in kunyomi:
                yield {"kunyomi": z.strip()}

then in your pipelines.py file

class SpidersPipeline:
    def process_item(self, item, spider):
        for i, kw in enumerate(["kanji","onyomi","kunyomi","meanings"]):
            if kw in item:
                self.files[i].write(item[kw] + "\n")

    def open_spider(self, spider):
        self.files = [open(x, "a", encoding="utf-8") for x in [
                      "kanji.txt", "onyomi.txt", "kunyomi.txt", 
                      "meanings.txt"]]

    def close_spider(self, spider):
        list(map(lambda x: x.close(), self.files))

and remember to uncomment the pipelines in the settings.py file

ITEM_PIPELINES = {
   'spiders.pipelines.SpidersPipeline': 300,   # <- make sure it is uncommented
}

Upvotes: 2

Related Questions