Reputation: 37
I did python script:
from string import punctuation
from collections import Counter
import urllib
from stripogram import html2text
myurl = urllib.urlopen("https://www.google.co.in/?gfe_rd=cr&ei=v-PPV5aYHs6L8Qfwwrlg#q=samsung%20j7")
html_string = myurl.read()
text = html2text( html_string )
file = open("/home/nextremer/Final_CF/contentBased/contentCount/hi.txt", "w")
file.write(text)
file.close()
Using this script I didn't get perfect output only some HTML code.
Upvotes: 1
Views: 3017
Reputation: 405
You dont need to write any hard algorithms to extract data from search result. Google has a API to do this.
Here is an example:
https://github.com/google/google-api-python-client/blob/master/samples/customsearch/main.py
But to use it, first you must to register in google for API Key.
All information you can find here:
https://developers.google.com/api-client-library/python/start/get_started
Upvotes: 0
Reputation: 135
import urllib
urllib.urlretrieve("http://www.example.com/test.html", "test.txt")
Upvotes: 0
Reputation: 139
What do you mean with "webpage text"? It seems you don't want the full HTML-File. If you just want the text you see in your browser, that is not so easily solvable, as the parsing of a HTML-document can be very complex, especially with JavaScript-rich pages. That starts with assessing if a String between "<" and ">" is a regular tag and includes analyzing the CSS-Properties changed by JavaScript-behavior.
That is why people write very big and complex rendering-Engines for Webpage-Browsers.
Upvotes: 2