phapha pha
phapha pha

Reputation: 329

python CSV fileoutput data is crumble into one column instead of multiple column

So, I'm doing a web scraping using Beautiful Soup

Let say I want to store the data to csv for every loop below

containers = page_soup.findAll("div",{"class":"product-img" })

filename = "result.csv"
f = open(filename, "w")
headers = "Link,Image\n"
f.write(headers)

for container in containers:
    items = container.findAll("a")

    for item in items:
        datalink = item.attrs['href']
        dataimg = item.attrs['src']

        f.write(datalink + "," + dataimg + "\n")

f.close()

When I open the csv file with excel ,
The data is crumble into 1 column instead of 2 column
What I got :

Column A  
Link,Image  
link1,img1  
link2,img2  
link3,img3  
link4,img4  
link5,img5  

What Expected :

Column A       Column B  
Link           Image  
link1          img1  
link2          img2  
link3          img3  
link4          img4  
link5          img5  

Upvotes: 0

Views: 101

Answers (1)

hunteke
hunteke

Reputation: 3716

Short answer: use a library. In this case, Python's built-in csv module:

Longer answer: without the exact code, and the data input you're using, the best I can do is guess at the problem. But, in short, due to edge cases and embedded quotes and commas, CSV is slightly more complicated than you think. Use a library that has already sussed out the details.

Additionally, don't "C-think" with the file manipulation. Use with. That is:

# "Bad"
f = open('somefile', 'w')
f.write( data )
f.close()

# Good
with open('somefile', 'w') as f:
    f.write( data )

I'll leave it to the docs (section 7.2), to explain why with is the (much) better path.

Finally, example code to get you on your way:

import csv

containers = page_soup.findAll("div",{"class":"product-img" })

filename = "result.csv"
with open(filename, "w") as f:
    csvwriter = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
    csvwriter.writerow( ('Link', 'Image') )

    for container in containers:
        items = container.findAll("a")

        for item in items:
            datalink = item.attrs['href']
            dataimg = item.attrs['src']

            csvwriter.writerow( (datalink, dataimg))

Upvotes: 1

Related Questions