How can I extract text under html div id tag in python

Question

I was wondering how I would be able to extract the text from this tag from this website: https://ru.thefreedictionary.com/%d1%88%d1%87%d0%be




            Слово в словаре не найдено.
 Быть может, вы искали:

The code I'm using gets everything under the id tag, but I'm looking only to get the text 'Слово в словаре не найдено.'

soup.findAll("div", attrs = {"id": ["MainTxt"]})

Thank you for any help!

joc · Accepted Answer

First of all, there is no need to combine findAll() with id attribute because there can only be one element with that id in that html so findAll() will always return list of one element. Here is how you could solve your problem.

match = soup.find('div', {'id': 'MainTxt'})
text = match.text.rstrip().lstrip().split('
')

rstrip() and lstrip() are for removing trailing spaces in front and in the back of the string. Now text is a list of elements: ['Слово в словаре не найдено. ', ' Быть может, вы искали: ', '', ...]. To get your target string is easy.

target_string = text[0].replace('
', '')

How can I extract text under html div id tag in python

Answers (2)

Related Questions