Python Web scraper using Beautifulsoup 4

Question

I wanted to create a database with commonly used words. Right now when I run this script it works fine but my biggest issue is I need all of the words to be in one column. I feel like what I did was more of a hack than a real fix. Using Beautifulsoup, can you print everything in one column without having extra blank lines?

import requests
import re
from bs4 import BeautifulSoup

#Website you want to scrap info from  
res = requests.get("https://github.com/first20hours/google-10000-english/blob/master/google-10000-english-usa.txt")
# Getting just the content using bs4
soup = BeautifulSoup(res.content, "lxml")

# Creating the CSV file
commonFile = open('common_words.csv', 'wb')

# Grabbing the lines you want
  for node in soup.findAll("tr"):
  # Getting just the text and removing the html
  words = ''.join(node.findAll(text=True))
  # Removing the extra lines
  ID = re.sub(r'[	
]', '', words)
  # Needed to add a break in the line to make the rows
  update = ''.join(ID)+'
'
  # Now we add this to the file 
  commonFile.write(update)
commonFile.close()

Python Web scraper using Beautifulsoup 4

Answers (1)

Related Questions