Reputation: 319
I have an html file that I am just retrieving the body of text from.
I would like to print it in one single line.
Right now I print it like this:
for line in newName.body(text=True):
print line
This gives me everything in the body that I would like is to print like:
for line in newName.body(text=True):
print line[257:_____] # this is where i need help
Instead of ____ or choosing another number as the end, I want it to go to the newline character, so it looks like:
for line in newName.body(text=True):
print line[257:'\n']
However that dosent work.
How can I make it work?
The text which I am working in is located in:
body
pre
The text I want
/pre
/body
Upvotes: 3
Views: 13905
Reputation: 414315
You could use .partition()
method to get the first line:
first_line = newName.body.getText().partition("\n")[0]
assuming newName
is a BeautifulSoup
object. It is usually named soup
.
To get text from the first <pre>
tag in the html:
text = soup.pre.string
To get a list of lines in the text:
list_of_lines = text.splitlines()
If you want to keep end of line markers in the text:
list_of_lines = text.splitlines(True)
To get i-th line from the list:
ith_line = list_of_lines[i]
note: zero-based indexing e.g., i = 2
corresponds to the 3rd line.
Upvotes: 8
Reputation: 4903
Is it that you want line[127:line.find('\n')]
as you are sure it's from 127
then equally you must be sure there's a \n
.
Upvotes: 2
Reputation: 21956
There is no guarantee that your HTML file has more than one line. The web page may be laid out in lines, but the structure of the page doesn't have to match the structure of the markup and vice versa.
Just to be sure, try this:
print len(newName.body(text=True).split('\n'))
If the value is >1, then you should be able to get the line you need like:
newName.body(text=True).split('\n')[257]
Maybe not the most graceful way, but it works, if there are in fact multiple lines.
Upvotes: 2