Reputation: 99
how can I get the length of the text of all document in one line if I'm using the for loop to extract the text before that?
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/Mount_Olympus,_Utah'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
text = soup.find_all('p')
for text2 in text:
r = text2.get_text().split()
print(len(r))
output:
117
23
84
73
66
69
9
Problem is that it's counting every line. I want length of the document in one line like this:
441
Upvotes: 1
Views: 356
Reputation: 126
You can use list comprehension with sum() -
print(sum([len(text2.get_text().split()) for text2 in text]))
so, first you iterate through text - for text2 in text
then, you extract text with .get_text() method, split it, and its length gets stored as an element in the list - len(text2.get_text().split())
then using sum(), you sum all elements of that array, and finally print it
edit: just to be clear, that one liner replaces whole for loop
Upvotes: 3
Reputation: 927
you can ad a variable total_len
and increment it by the element length for every element
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/Mount_Olympus,_Utah'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
text = soup.find_all('p')
total_len = 0
for text2 in text:
r = text2.get_text().split()
total_len += len(r)
print(total_len)
Upvotes: 1
Reputation: 74645
Sum them up:
total_length = 0
for text2 in text:
r = text2.get_text().split()
total_length += len(r)
print(total_length)
Upvotes: 2
Reputation: 1438
Here:
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/Mount_Olympus,_Utah'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
text = soup.find_all('p')
sum = 0
for text2 in text:
r = text2.get_text().split()
sum= sum + len(r)
print(sum)
Upvotes: 2