Reputation: 556
I have bunch of xml files and I am trying to exctract from those files by using beautifulsoup. Here is my code:
text = """
<B510>
<B511><PDAT>G03B 2742</PDAT></B511>
<B512><PDAT>G03B 2758</PDAT></B512>
<B512><PDAT>G03B 2762</PDAT></B512>
<B516><PDAT>7</PDAT></B516>
</B510>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'lxml')
### Classification info
class_info = soup.find_all("b510")
class_info = ", ".join([x.text.strip() for x in class_info])
This is what I get:
G03B 2742
G03B 2758
G03B 2762
7
Why I cannot get the text into a sinle line?
Upvotes: 0
Views: 28
Reputation: 8913
Considering <PDAT>
should be enough:
[i.text for i in soup.find('b510').find_all("pdat")]
output:
['G03B 2742', 'G03B 2758', 'G03B 2762', '7']
Upvotes: 1