Jim jason
Jim jason

Reputation: 19

How to remove common words in a paragraph phrased using python

I need a way to remove the common words within the above phrased content of a webpage. how to integrate such method.

third_headers = ' '.join([r.text for r in soup.find_all('h3')]) third_headers

I got an Output - 'HTML and CSS Data Analytics XML Tutorials JavaScript Programming Server Side Web Building Data Analytics XML Tutorials HTML CSS JavaScript Programming Server Side XML Character Sets Exercises Quizzes Courses Certificates Example Example Explained'

Need a new output without common words ( common words removed using from a common word corpus)

Upvotes: 0

Views: 406

Answers (1)

re-za
re-za

Reputation: 790

Assuming we have a corpus of common words in a list called CORPUS:

raw = 'HTML and CSS Data Analytics XML Tutorials JavaScript Programming Server Side Web Building Data Analytics XML Tutorials HTML CSS JavaScript Programming Server Side XML Character Sets Exercises Quizzes Courses Certificates Example Example Explained'


CORPUS = ["And", "So", "If", "etc."]           # assumed to have
corpus = [w.lower() for w in CORPUS]           # to lowercase

words = raw.split()
processed = [w for w in words if w.lower() not in corpus]

print(processed)

Upvotes: 2

Related Questions