Reputation: 1471
I am trying to make a program (for practice) that count how many chapters and verses in each book of bible.... So let say if I want to know total chapters or verses in book 1 then it will give me the total number. If I only want to know the number of verse in chapter 4 in book 2 then it only gives me the number of verses in that particular chapter. Also, same for the chapters.
So, my logic was to looks for font class: tk4l
(which is unique font size for the body of context) from this web site:
http://www.holybible.or.kr/B_NIV/cgi/bibleftxt.php?VR=NIV&VL=1&CN=1&CV=99
and if it finds the font class then add 1 to my count of chapters and if fails to find the font class move on to the next book ( book += 1
) and do the same thing..
I was going to use :
import requests
from bs4 import BeautifulSoup
import operator
def read_chapters(max_books, max_chapters):
book=1
chapter=1
while chapter <= max_chapters:
url = 'http://www.holybible.or.kr/B_NIV/cgi/bibleftxt.php?VR=NIV&VL={}&CN={}&CV=99'.format(book, chapter)
source_code = requests.get(url).text
soup = BeautifulSoup(source_code, "html.parser")
for bible_text in soup.findAll('font', {'class': 'tk4l'}):
and so on...
My question is...
1) how can I print that chapter count?? 2) I have no idea how I should count the number of verses..
I just started to study Python. Please help me on this.. T.T
Upvotes: 1
Views: 4515
Reputation: 1576
First you need to get the HTML content of that page. I recommend using the package requests
.
import requests
page = requests.get("http://www.holybible.or.kr/B_NIV/cgi/bibleftxt.php?VR=NIV&VL=1&CN=1&CV=99")
To expand on your idea of counting the font usage of tk4l, this could be done by searching for this sub string in the webpagem content:
verses = str(page.content).count("font class=tk4l")
print(verses)
To get the number of chapters you could proceed in a similar manner with string operations if you identify a unique attribute about the way they are listed.
EDIT: To expand on the number of chapters. This is a little tricky, since the only attribute I immediately notice is, that the chapters are in the pagination. Without using any further packages, you could use some string operations to iterate through the pagination and find the maximum. I am afraid the approach is a bit tricky, but it should work for identifying the maximum number of chapters on the page you mentioned.
import requests
page = requests.get("http://www.holybible.or.kr/B_NIV/cgi/bibleftxt.php?VR=NIV&VL=1&CN=1&CV=99")
verses = str(page.content).split("http://www.holybible.or.kr/images/l_arrow.gif")[1].split("http://www.holybible.or.kr/images/arrow.gif")[0]
currmax = 0
for i in range(len(verses)):
if verses[i] == ">":
if verses[i+2:i+7] == "</a>&":
if currmax < int(verses[i+1]):
currmax = int(verses[i+1])
if verses[i+3:i+8] == "</a>&":
if currmax < int(verses[i+1:i+3]):
currmax = int(verses[i+1:i+3])
print(currmax)
EDIT 2: With regular expressions, the same task can be accomplished in a more compact manner:
import requests
import re
page = requests.get("http://www.holybible.or.kr/B_NIV/cgi/bibleftxt.php?VR=NIV&VL=1&CN=1&CV=99")
contents = str(page.content)
x = max(int(i) for i in re.findall(r'>(\d+)</[ab]> ', contents))
print(x)
Upvotes: 2