Reputation: 358
I have the following html:
<body><h3>Full Results for race 376338</h3>"Category","Position","Name","Time","Team"<br>"A","1","James","20:20:00","5743"<br><br>"A","2","Matt","20:15:00"<br>
It continues like <br> # some text <br>
for hundreds of rows.
I want to create a new line at each
, so it is in CSV format like this:
<body><h3>Full Results for race 376338</h3>"Category","Position","Name","Time","Team"
<br>"A","1","James","20:20:00","5743"<br>
<br>"A","2","Matt","20:15:00"<br>
and I have this code:
soup = BeautifulSoup(html_string, features="html.parser")
for br in soup.find_all('br'):
soup.replace_with("\n")
With this I get the error: ValueError: Cannot replace one element with another when the element to be replaced is not part of a tree.
What do I need to change?
Upvotes: 0
Views: 149
Reputation: 23753
You want the text attribute.
In [15]: soup.text
Out[15]: 'Full Results for race 376338"Category","Position","Name","Time","Team"\n"A","1","James","20:20:00","5743"\n"A","2","Matt","20:15:00"'
In [16]: soup.text.split()
Out[16]:
['Full',
'Results',
'for',
'race',
'376338"Category","Position","Name","Time","Team"',
'"A","1","James","20:20:00","5743"',
'"A","2","Matt","20:15:00"']
In [17]: soup.text.split()[4:]
Out[17]:
['376338"Category","Position","Name","Time","Team"',
'"A","1","James","20:20:00","5743"',
'"A","2","Matt","20:15:00"']
Or the get_text
method.
In [24]: soup.get_text()
Out[24]: 'Full Results for race 376338"Category","Position","Name","Time","Team"\n"A","1","James","20:20:00","5743"\n"A","2","Matt","20:15:00"'
Or
In [25]: [text for text in soup.stripped_strings]
Out[25]:
['Full Results for race 376338',
'"Category","Position","Name","Time","Team"',
'"A","1","James","20:20:00","5743"',
'"A","2","Matt","20:15:00"']
Those last two are straight from the documentation.
Upvotes: 1