The Spiteful Octopus
The Spiteful Octopus

Reputation: 319

Python splitting to the newline character

I have an html file that I am just retrieving the body of text from.
I would like to print it in one single line.

Right now I print it like this:

for line in newName.body(text=True):
    print line

This gives me everything in the body that I would like is to print like:

for line in newName.body(text=True):
    print line[257:_____] # this is where i need help

Instead of ____ or choosing another number as the end, I want it to go to the newline character, so it looks like:

for line in newName.body(text=True):
    print line[257:'\n'] 

However that dosent work.
How can I make it work?

The text which I am working in is located in:

body
    pre
        The text I want
    /pre
/body

Upvotes: 3

Views: 13905

Answers (3)

jfs
jfs

Reputation: 414315

You could use .partition() method to get the first line:

first_line = newName.body.getText().partition("\n")[0]

assuming newName is a BeautifulSoup object. It is usually named soup.

To get text from the first <pre> tag in the html:

text = soup.pre.string

To get a list of lines in the text:

list_of_lines = text.splitlines()

If you want to keep end of line markers in the text:

list_of_lines = text.splitlines(True)

To get i-th line from the list:

ith_line = list_of_lines[i]

note: zero-based indexing e.g., i = 2 corresponds to the 3rd line.

Upvotes: 8

sotapme
sotapme

Reputation: 4903

Is it that you want line[127:line.find('\n')] as you are sure it's from 127 then equally you must be sure there's a \n.

Upvotes: 2

Chris Johnson
Chris Johnson

Reputation: 21956

There is no guarantee that your HTML file has more than one line. The web page may be laid out in lines, but the structure of the page doesn't have to match the structure of the markup and vice versa.

Just to be sure, try this:

print len(newName.body(text=True).split('\n'))

If the value is >1, then you should be able to get the line you need like:

newName.body(text=True).split('\n')[257]

Maybe not the most graceful way, but it works, if there are in fact multiple lines.

Upvotes: 2

Related Questions