Earl
Earl

Reputation: 11

Python POST request encoding

here's the situation, i'm sending POST requests and trying to fetch the response with Python problem is that it distorts non latin letters, which doesn't happen when i fetch the same page with direct link (with no search results), but POST requests wont generate a link

here's what i do:

import urllib
import urllib2
url = 'http://donelaitis.vdu.lt/main_helper.php?id=4&nr=1_2_11'
data = 'q=bus&ieskoti=true&lang1=en&lang2=en+-%3E+lt+%28+71813+lygiagre%C4%8Di%C5%B3+sakini%C5%B3+%29&lentele=vertikalus&reg=false&rodyti=dalis&rusiuoti=freq' 
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
file = open("pagesource.txt", "w")
file.write(the_page)
file.close()

whenever i try

thepage = the_page.encode('utf-8')

i get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1008: ordinal not in range(128)

whenever i try do change response header Content-Type:text/html;charset=utf-8, i do

response['Content-Type'] = 'text/html;charset=utf-8'

i get this error:

AttributeError: addinfourl instance has no attribute '__setitem__'

My question: is it possible to edit or remove response or request headers? if not, is there another way to solve this problem other that copying source to notepad++ and fixing encoding manually?

i'm new to python and data mining, really hope you'd let me know if i;m doing something wrong

thanks

Upvotes: 1

Views: 8255

Answers (2)

jsbueno
jsbueno

Reputation: 110311

Why don't your try thepage = the_page.decode('utf-8')instead of encode since what you want is to move from utf-8 encoded text to unicode - coding agnostic - internal strings?

Upvotes: 2

Daniel Roseman
Daniel Roseman

Reputation: 599628

Two things. Firstly, you don't want to encode the response, you want to decode it:

thepage = the_page.decode('utf-8')

And secondly, you don't want to set the header on the response, you set it on the request, using the add_header method:

req.add_header('Content-Type', 'text/html;charset=utf-8')

Upvotes: 1

Related Questions