Jonas
Jonas

Reputation: 99

How to get length of the text inside for loop

how can I get the length of the text of all document in one line if I'm using the for loop to extract the text before that?

import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/Mount_Olympus,_Utah'

response = requests.get(url)

soup = BeautifulSoup(response.content, 'lxml')

text = soup.find_all('p')

for text2 in text:
    r = text2.get_text().split()
    print(len(r))
    

output:

117
23
84
73
66
69
9

Problem is that it's counting every line. I want length of the document in one line like this:

441

Upvotes: 1

Views: 356

Answers (4)

kaumnen
kaumnen

Reputation: 126

You can use list comprehension with sum() -

print(sum([len(text2.get_text().split()) for text2 in text]))

so, first you iterate through text - for text2 in text

then, you extract text with .get_text() method, split it, and its length gets stored as an element in the list - len(text2.get_text().split())

then using sum(), you sum all elements of that array, and finally print it

edit: just to be clear, that one liner replaces whole for loop

Upvotes: 3

Hadrian
Hadrian

Reputation: 927

you can ad a variable total_len and increment it by the element length for every element

import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/Mount_Olympus,_Utah'

response = requests.get(url)

soup = BeautifulSoup(response.content, 'lxml')

text = soup.find_all('p')

total_len = 0
for text2 in text:
    r = text2.get_text().split()
    total_len += len(r)

print(total_len)

Upvotes: 1

Dan D.
Dan D.

Reputation: 74645

Sum them up:

total_length = 0
for text2 in text:
    r = text2.get_text().split()
    total_length += len(r)
print(total_length)

Upvotes: 2

Sadaf Shafi
Sadaf Shafi

Reputation: 1438

Here:

import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/Mount_Olympus,_Utah'

response = requests.get(url)

soup = BeautifulSoup(response.content, 'lxml')

text = soup.find_all('p')
sum = 0
for text2 in text:
    r = text2.get_text().split()
    sum= sum + len(r)
print(sum)

Upvotes: 2

Related Questions