Reputation: 9
I'm writing some script which capture data from web site and save them into DB. Some of datas are merged and I need to split them. I have sth like this
Endokrynologia (bez st.),Położnictwo i ginekologia (II st.)
So i need to get:
Endokrynologia (bez st.)
Położnictwo i ginekologia (II st.)
So i wrote some code in python:
#!/usr/bin/env python
# -*- encoding: utf-8
import MySQLdb as mdb
from lxml import html, etree
import urllib
import sys
import re
Nr = 17268
Link = "http://rpwdl.csioz.gov.pl/rpz/druk/wyswietlKsiegaServletPub?idKsiega="
sock = urllib.urlopen(Link+str(Nr))
htmlSource = sock.read()
sock.close()
root = etree.HTML(htmlSource)
result = etree.tostring(root, pretty_print=True, method="html")
Spec = etree.XPath("string(//html/body/div/table[2]/tr[18]/td[2]/text())")
Specjalizacja = Spec(root)
if re.search(r'(,)\b', Specjalizacja):
text = Specjalizacja.split()
print text[0]
print text[1]
and i get:
Endokrynologia
(bez
what i'm doing wrong ?
Upvotes: 1
Views: 102
Reputation: 1506
you would try to replace
text = Specjalizacja.split()
with
text = Specjalizacja.split(',')
Don't know whether that would fix your problem.
Upvotes: 0