Arseniy Krupenin
Arseniy Krupenin

Reputation: 3880

How to get text from url

I have some urls

http://go.mail.ru/search?fr=vbm9&fr2=query&q=%D0%BF%D1%80%D0%BE%D0%B3%D1%83%D0%BB%D0%BA%D0%B0+%D0%B0%D0%BA%D1%82%D0%B5%D1%80%D1%8B&us=10&usln=1
https://www.google.ru/search?q=NaoOmiKi&oq=NaoOmiKi&aqs=chrome..69i57j69i61&sourceid=chrome&es_sm=0&ie=UTF-8
https://yandex.ru/search/?text=%D0%BE%D1%82%D0%BA%D1%83%D0%B4%D0%B0%20%D0%B2%D0%B5%D0%B7%D1%83%D1%82%20%D0%BE%D0%B4%D0%B5%D0%B6%D0%B4%D1%83%20%D0%B2%20%D1%81%D0%B5%D0%BA%D0%BE%D0%BD%D0%B4%20%D1%85%D0%B5%D0%BD%D0%B4&clid=2073067

When I run this url in browser I get, that it's search of:

прогулка актеры
NaoOmiKi
откуда везут одежду в секонд хенд

I want to write code to get this values. I try

get = urlparse(url)
print urllib.unquote(get[4])

But it doesn't work correctly for all url. What I should use?

Upvotes: 0

Views: 397

Answers (1)

Jieter
Jieter

Reputation: 4229

urlparse parses a URL into 6 components: scheme, netloc, path, params, query, fragment. You correctly use index 4 to get the path.

The path however, is a &-separated string of key=value pairs with the values urlencoded. You try to unquote the entire string, while you are only interested in the value of the text or q key.

You can use urlparse.parse_qs to parse the querystring and look for q or text keys in the returned dict.

Upvotes: 2

Related Questions