Reputation: 79
In the example bellow, page_numb.text
yields the string "pp. 1–25". I am trying to assign the "25" to a variable. For some reason this gets passed to a list as is. It doesn't split at the separator "-" but returns one string object in the list: "pp. 1–25".
page_numb = page_numb.text
final_page_numb = page_numb.split("-")
final_page_numb = final_page_numb[-1]
print(final_page_numb)
Upvotes: 1
Views: 412
Reputation: 7627
Option 1 Try with re.search()
import re
page_numb = "pp. 1–25"
final_page_numb = re.search('\d+$', page_numb)[0]
print(final_page_numb) # 25
Option 2 Try with re.split()
page_numb = "pp. 1–25"
final_page_numb = re.split('[^\d]', page_numb)[-1]
print(final_page_numb) # 25
Upvotes: 1
Reputation: 79
As suggested in the answers/comments before, this was indeed an em dash. Weirdly enough when I typed the em dash using my keyboard ( Option + Shift + Minus in Mac keyboard), it didn't work. When I copied one from one of the returned strings, it worked. I guess there are different types of em dashes.
Upvotes: 1
Reputation: 4642
–
is not the same as -
.
page_numb.text
yields "pp. 1–25" which contains an em dash. Change it to a normal dash and you'll be fine.
Or replace -
(normal dash) with –
(em dash) and the value from page_numb.text
will be split.
page_numb = page_numb.text
final_page_numb = page_numb.split("–")
final_page_numbs = final_page_numb[-1]
print(final_page_numbs)
Upvotes: 2