UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-9: ordinal not in range(128)

Question

I've been working on a program to go through various links that I already have saved in a text file, which are mostly summer opportunities/camps/etc. and scrape through them to see if key words like "scholarship" or "financial aid" pop up. However, when I run through it, it gives me the error that's in the title above.

This question has been asked a few times, but it appears to be for different reasons for different people. Therefore, I get that there's probably an error involving Unicode, but I have no idea where or why that would be.

This is the code:

import BeautifulSoup
import requests
import nltk

file_from = open("links.txt", "r")
list_of_urls = file_from.read().splitlines()

aid_words = ["financial", "aid", "merit", "scholarship"]

count = 0

fin_aid = []

while count <= 10:
    for url in list_of_urls:
        clean = 1
        result = "nothing found"
        source = requests.get(url)
        plain_text = source.text
        soup = BeautifulSoup.BeautifulSoup(plain_text)
        print (str(url).upper())
        for links in soup.findAll('p', text = True):
            tokenized_text = nltk.word_tokenize(links)
            for word in tokenized_text:
                if word not in aid_words:
                    print ("not it " + str(clean))
                    clean += 1
                    pass
                else:
                    result = str(word)
                    print (result)
                    fin_aid.append(url)
                    break
    count += 1
    the_golden_book = {"link: ": str(url), "word found: ": str(result)}
    fin_aid.append(the_golden_book)

file_to = open("links_with_aid.txt", "w")
file_to.write(str(fin_aid))
file_to.close()

print ("scrape finished")
print (str(fin_aid))

Basically, I wanted to take all the links from links.txt, visit the first ten (as a test), search for the four words in the list "aid_words", and return results in the form of "not it" and the number of words searched so far, if none of the words have been found yet, or the word that was detected if one is found (so that I can visit the link later and search for it, to see if it's a false alarm or not).

When I run this through the Command Prompt, this is the stuff it shows me right before the error message.

Traceback (most recent call last):
  File "finaid.py", line 20, in 
    soup = BeautifulSoup.BeautifulSoup(plain_text.encode("utf-8"))
  File "C:\Python27\lib\site-packages\BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "C:\Python27\lib\site-packages\BeautifulSoup.py", line 1147, in __init__
    self._feed(isHTML=isHTML)
  File "C:\Python27\lib\site-packages\BeautifulSoup.py", line 1189, in _feed
    SGMLParser.feed(self, markup)
  File "C:\Python27\lib\sgmllib.py", line 104, in feed
    self.goahead(0)
  File "C:\Python27\lib\sgmllib.py", line 143, in goahead
    k = self.parse_endtag(i)
  File "C:\Python27\lib\sgmllib.py", line 320, in parse_endtag
    self.finish_endtag(tag)
  File "C:\Python27\lib\sgmllib.py", line 358, in finish_endtag
    method = getattr(self, 'end_' + tag)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-9: ordin
al not in range(128)

I'm running this on Python 2.7.10, and I'm on Windows 8.1. Thanks for any help you can provide! As far as I can tell, it shouldn't be anything in "link.txt", which is literally just links that a colleague crawled and saved earlier.

UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-9: ordinal not in range(128)

Answers (1)

Related Questions

UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode characters in position 7-9: ordinal not in range(128)

Answers (1)

Related Questions

UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-9: ordinal not in range(128)