hawk student
hawk student

Reputation: 19

How to use split function for file in python?

I have a file with a bunch of information. For example, all of the lines follow the same pattern as this:

     <school>Nebraska</school>

I am trying to use the split function to only retrieve 'Nebraska'. This is what I have so far, but I'm not sure what to put to make it cut off both parts instead of just the first.

   with open('Pro.txt') as fo:
       for rec in fo:
          print(rec.split('>')[1])

With this I get:

    Nebraska</school

Upvotes: 0

Views: 1220

Answers (3)

Maurice Meyer
Maurice Meyer

Reputation: 18136

You could use a regular expression:

import re
regexp = re.compile('<school>(.*?)<\/school>')

with open('Pro.txt') as fo:
    for rec in fo:
        match = regexp.match(rec)
        if match: 
            text = match.groups()[0]
            print(text)

Upvotes: 0

宏杰李
宏杰李

Reputation: 12168

s = '<school>Nebraska</school>'

in:

s.split('>')

out:

['<school', 'Nebraska</school', '']

in:

s.split('>')[1].split('<')

out:

['Nebraska', '/school']

in:

s.split('>')[1].split('<')[0]

out:

'Nebraska'

Upvotes: 0

TigerhawkT3
TigerhawkT3

Reputation: 49330

You've cut off part of the string. Keep going in the same fashion:

>>> s = '<school>Nebraska</school>'
>>> s.split('>')[1]
'Nebraska</school'
>>> s.split('>')[1].split('<')[0]
'Nebraska'

That said, you should parse HTML with an HTML parser like BeautifulSoup.

Upvotes: 1

Related Questions