6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"how can i remove the html content from the output?\",\"text\":\"

import urllib\\n\\ndata = urllib.urlopen(\\\"https://www.python.org/\\\")\\nfor line in data:\\n    line.strip()\\n    print line\\n

\\n\\n

I am trying to make a web crawler but when I run the above code ,some HTML stuff also gets printed .I only want the text portion of the web page and the hyperlinks

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"shubham\"},\"upvoteCount\":1,\"answerCount\":2,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"

Use beautiful soup library for making a web crawler and handling HTML tags.

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"priyanka\"},\"upvoteCount\":1}}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","python-2.7",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/python-2.7/1","children":"python-2.7"}]}],["$","span","web-crawler",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/web-crawler/1","children":"web-crawler"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://i.sstatic.net/fgk4W.png?s=256","alt":"shubham","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/3488233/shubham","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"shubham"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",125]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"how can i remove the html content from the output?"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

import urllib\n\ndata = urllib.urlopen(\"https://www.python.org/\")\nfor line in data:\n    line.strip()\n    print line\n

\n\n

I am trying to make a web crawler but when I run the above code ,some HTML stuff also gets printed .I only want the text portion of the web page and the hyperlinks

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",1]}],["$","p",null,{"children":["Views: ",53]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",2,")"]}],[["$","div","25210477",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/6ec964026957ab5946980e481649da20?s=256&d=identicon&r=PG&f=y&so-version=2","alt":"priyanka","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/3126780/priyanka","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"priyanka"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",244]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

Use beautiful soup library for making a web crawler and handling HTML tags.

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",1]}]}]]}],["$","div","24595843",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/1c71c745679de368af348dac2a59fd98?s=256&d=identicon&r=PG&f=y&so-version=2","alt":"BeaumontTaz","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/3599707/beaumonttaz","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"BeaumontTaz"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",273]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

A somewhat rudimentary solution would be to .split over \"<\" and \">\" tags and then just check the resulting list to remove elements starting at any \"<\" and ending at the next \">\".

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",1]}]}]]}]]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","72364637",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/72364637","className":"text-blue-600 hover:underline","children":"Remove HTML tags and unwanted information in Python With BeautifulSoup"}]}],["$","li","69726280",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/69726280","className":"text-blue-600 hover:underline","children":"How can I change the code to make it such that the html tags do not appear"}]}],["$","li","65385845",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/65385845","className":"text-blue-600 hover:underline","children":"removing additional data (html tags) from output?"}]}],["$","li","61297985",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/61297985","className":"text-blue-600 hover:underline","children":"Removing HTML from Web Scrape"}]}],["$","li","59394759",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/59394759","className":"text-blue-600 hover:underline","children":"Beautifulsoup - Remove HTML tags"}]}],["$","li","53887905",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/53887905","className":"text-blue-600 hover:underline","children":"How to remove HTML tags from output text?"}]}],["$","li","52307414",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/52307414","className":"text-blue-600 hover:underline","children":"Python Web Scrape: Remove excess HTML tags in output. All data are from a page table, get_text and pretiffy doesnt work"}]}],["$","li","50811616",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/50811616","className":"text-blue-600 hover:underline","children":"Remove html tag from Website - BeautifulSoup"}]}],["$","li","42837666",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/42837666","className":"text-blue-600 hover:underline","children":"Getting rid of html tags in python when scraping"}]}],["$","li","29736055",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/29736055","className":"text-blue-600 hover:underline","children":"My python code output wrong html data"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

how can i remove the html content from the output?

Answers (2)

Related Questions