Split text to get specific part?

Question

I'm using Python to pull out the country of residence that somebody has. The lines where the country is in are (address faked):


Buyer Information

 Username:
joedane (6)

 E-Mail: lala@lala.la

 Name & Address:
Joe Dane

XXXX 24

12345 QWERTY

Germany

Seller Information

I need to get that 'Germany' on the third to last row. However, the country and address will be different each time, so I need a way to pull out the country, but not depending on the address before it.

I have tried:

#get Shipping Destination
shippingDest = order.split('
Seller Information')[0].split('
')[1]

But it doesn't stop on the first BR it finds before the line. Hopefully, my split concept is wrong. This should be an easy problem. Any help?

EDIT:

The actual code continues and after Seller information there is a similar code as in the buyer information with Germany, but with my own country. The script yields Spain, my own country. Can I somehow let it skip my country and go for the Second? Would be the one after Seller Information if you are going backwards.

This is the actual code until the end of the html. After Germany it's always the same.


Buyer Information

 Username:
joedane (6)

 E-Mail: lala@lala.la

 Name & Address:
Joe Dane

XXXX 24

12345 QWERTY

Germany

Seller Information


 Username: Brick_Top (466)




 Store Name: Top Bricks from Brick Top

 Store Link: http://www.bricklink.com/store.asp?p=Brick_Top

 E-Mail: myemail@gmail.com

 Name & Address:
Gerald Me

qwerty 234

Sevilla 41500

Spain

All I want to get is that Germany (the first country from the two). Many, many thanks.

EDIT 2.0:

Interestingly enough, I was able to do it just adding that [-5]. I don't understand it well but my guess is that it find the fifth BR from the first table.

from bs4 import BeautifulSoup
import sys

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')

country = soup.find('table').find_all('br')[-5]
print(country.find_next(text=True).string)

Birei · Accepted Answer

I suggest you to use a html parser like beautifulsoup. It finds the last of the table and from there search next sibling including text nodes, which returns the country:

from bs4 import BeautifulSoup
import sys 

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')

country = soup.find('table').find_all('br')[-1]
print(country.find_next(text=True).string)

Run it like:

python3 script.py htmlfile

That yields:

Germany

Split text to get specific part?

Answers (2)

Related Questions