Atul Pant
Atul Pant

Reputation: 1

Information in csv file not retrieved properly using python

I am scraping some contents from this site. While writing like the conference head after extracting from the site in csv file the first name is not coming properly, e.g. if the word is microsoft it is coming as osoft but rest all of the words are coming properly

Here is my code:

import csv
import requests
from bs4 import BeautifulSoup

with open('random.csv', 'w') as csvfile:
    a = csv.writer(csvfile)
    a.writerow(["conferenceHead"])

    url = given above      
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    links = soup.find_all("div")

    r_data = soup.find_all("div",{"class":"conferenceHead"})
    for item in r_data:
        conferenceHead = item.contents[1].text


        with open('random.csv','a') as csvfile:
            a = csv.writer(csvfile)
            data = [conferenceHead]
        a.writerow(data)

Upvotes: 0

Views: 39

Answers (1)

JustMe
JustMe

Reputation: 710

Well, You have three issues in Your code.

  • dual with open() statements (on the same file)
  • and second open - in append mode, is in a loop, which makes this even worse
  • last writerow is out of scope, and csvfile is already closed

This might cause buffer not being written to file, and truncating string You are saving.

After fixing this errors (removing with open('random.csv','a') as csvfile and fixing indentation) code runs and output is not trimmed.

import csv
import requests
from bs4 import BeautifulSoup
with open('random.csv', 'w') as csvfile:
    a = csv.writer(csvfile)
    a.writerow(["conferenceHead"])

    url = "http://www.allconferences.com/search/index"\
          "/Category__parent_id:1/Venue__country:United%20States"\
          "/Conference__start_date__from:01-01-2010/sort:start_date"\
          "/direction:asc/showLastConference:1/page:7/"
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    links = soup.find_all("div")

    r_data = soup.find_all("div",{"class":"conferenceHead"})

    for item in r_data:
        conferenceHead = item.contents[1].text
        data = [conferenceHead]
        a.writerow(data)

Upvotes: 1

Related Questions