technocon
technocon

Reputation: 15

find an xml tag and replace the tag name with python

The kml file i am parsing: http://pastebin.com/kU5rPssk

I am looking for all of the <name> tags that match this regex \<name\>(\d+ \@.*)\<\/name\> and then manipulate the text of the tag.

Here is my code that I used to try to test the regex:

import re
from bs4 import BeautifulSoup

#Open the KML file.
xmldoc = open('doc.kml', "r+")
soup = BeautifulSoup(xmldoc, "xml")

p = re.compile(r"\<name\>(\d+ \@.*)\<\/name\>")

result = re.findall(p, soup)

print result

I get the following error:

Traceback (most recent call last):
File ".\regex_test.py", line 10, in <module>
result = re.findall(p, soup)
File "C:\Python27\lib\re.py", line 177, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer

What am I doing wrong?

Upvotes: 1

Views: 351

Answers (1)

alecxe
alecxe

Reputation: 474071

Pass a regular expression to the text argument of find_all():

import re
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('doc.kml'), 'xml')
for name in soup.find_all('name', text=re.compile("\d+ @.*")):
    print name

It prints:

<kml:name>13233 @ 2014-05-19 21:35:30 GMT (ACPU)</kml:name>
<kml:name>13233 @ 2014-05-19 21:36:30 GMT (ACPU)</kml:name>
<kml:name>13233 @ 2014-05-19 21:37:30 GMT (ACPU)</kml:name>
...
<kml:name>13233 @ 2014-05-19 22:28:30 GMT (ACPU)</kml:name>

Upvotes: 2

Related Questions