Linu
Linu

Reputation: 597

Fetching elements from XML and insert into Postgres DB

I have an XML file like this i need to insert this data to PostgreSQL DB.Below is the sample XML and the code which i use ,but i'm not getting any output,Can someone please guide on how to effectively fetch these XML values.

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0" encoding="utf-8">
  <config>
    <g:system>Magento</g:system>
    <g:extension>Magmodules_Googleshopping</g:extension>
    <g:extension_version>1.6.8</g:extension_version>
    <g:store>emb</g:store>
    <g:url>https://www.xxxxx.com/</g:url>
    <g:products>1320</g:products>
    <g:generated>2020-06-11 11:18:32</g:generated>
    <g:processing_time>17.5007</g:processing_time>
  </config>
  <channel>
    <item>
      <g:id>20</g:id>
      <g:title>product 1</g:title>
      <g:description>description about product 1</g:description>
      <g:gtin>42662</g:gtin>
      <g:brand>company</g:brand>
      <g:mpn>0014</g:mpn>
      <g:link>link.html</g:link>
      <g:image_link>link/c/a/cat_21_16.jpg</g:image_link>
      <g:availability>in stock</g:availability>
      <g:condition>new</g:condition>
      <g:price>9</g:price>
      <g:shipping>
        <g:country>UAE</g:country>
        <g:service>DHL</g:service>
        <g:price>2.90</g:price>
      </g:shipping>
    </item>
    <item>
     .
     .
     .
   </item>

Below is the script which i use, Python : 3.5 Postgres version 11

    # import modules
    import sys
    import psycopg2
    import datetime
    now = datetime.datetime.now()
    # current data and time
    dt = now.strftime("%Y%m%dT%H%M%S")

    # xml tree access
    #from xml.etree import ElementTree
    import xml.etree.ElementTree as ET

    # incremental variable
    x = 0
    with open('/Users/admin/documents/shopping.xml', 'rt',encoding="utf8") as f:
            #tree = ElementTree.parse(f)
            tree = ET.parse(f)
    # connection to postgreSQL database
    try:
        conn=psycopg2.connect(host='localhost', database='postgres',
                                       user='postgres', password='postgres',port='5432')
    except:
        print ("Hey I am unable to connect to the database.")
    cur = conn.cursor()
    # access the xml tree element nodes
    try:
        for node in tree.findall('.//item'):
            src = node.find('id')
            tgt = node.find('mpn')
            print(node)

    except:
        print ("Oops I can't insert record into database table!")
        conn.commit()
        conn.close()

The current output i'm getting is like,

None
None
None

Expected Output,

id title       description    gtin ......
20 product 1   g:description  xxxx .....

Upvotes: 0

Views: 404

Answers (1)

furas
furas

Reputation: 142641

Strange is that you can't find item. It seems you use wrong file and it doesn't have item.

Using your XML data as string and ET.fromstring() I have no problem to get item.

Maybe check print( f.read() ) to see what you really read from file.


Problem is only id, tgt which use namespace - g: - and it need something more then only g:id, g:tgt

tree = ET.fromstring(xml)

ns = {'g': "http://base.google.com/ns/1.0"}

for node in tree.findall('.//item'):

    src = node.find('g:id', ns)
    tgt = node.find('g:mpn', ns)

    print('Node:', node)
    print('src:', src.text)
    print('tgt:', tgt.text)

or use directly as '{http://base.google.com/ns/1.0}id' '{http://base.google.com/ns/1.0}mpn'

tree = ET.fromstring(xml)

for node in tree.findall('.//item'):

    src = node.find('{http://base.google.com/ns/1.0}id')
    tgt = node.find('{http://base.google.com/ns/1.0}mpn')

    print('Node:', node)
    print('src:', src.text)
    print('tgt:', tgt.text)

Minimal working code:

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0" encoding="utf-8">
  <config>
    <g:system>Magento</g:system>
    <g:extension>Magmodules_Googleshopping</g:extension>
    <g:extension_version>1.6.8</g:extension_version>
    <g:store>emb</g:store>
    <g:url>https://www.xxxxx.com/</g:url>
    <g:products>1320</g:products>
    <g:generated>2020-06-11 11:18:32</g:generated>
    <g:processing_time>17.5007</g:processing_time>
  </config>
  <channel>
    <item>
      <g:id>20</g:id>
      <g:title>product 1</g:title>
      <g:description>description about product 1</g:description>
      <g:gtin>42662</g:gtin>
      <g:brand>company</g:brand>
      <g:mpn>0014</g:mpn>
      <g:link>link.html</g:link>
      <g:image_link>link/c/a/cat_21_16.jpg</g:image_link>
      <g:availability>in stock</g:availability>
      <g:condition>new</g:condition>
      <g:price>9</g:price>
      <g:shipping>
        <g:country>UAE</g:country>
        <g:service>DHL</g:service>
        <g:price>2.90</g:price>
      </g:shipping>
    </item>
  </channel>
</rss>    
'''

tree = ET.fromstring(xml)

ns = {'g': "http://base.google.com/ns/1.0"}

for node in tree.findall('.//item'):

    src = node.find('g:id', ns)
    tgt = node.find('g:mpn', ns)

    print('Node:', node)
    print('src:', src.text)
    print('tgt:', tgt.text)

Result:

Node: <Element 'item' at 0x7f74ba45b710>
src: 20
tgt: 0014

BTW: It works even when I use io.StringIO to simulate file

f = io.StringIO(xml)
tree = ET.parse(f)

Minimal working code:

import xml.etree.ElementTree as ET
import io

xml = '''<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0" encoding="utf-8">
  <config>
    <g:system>Magento</g:system>
    <g:extension>Magmodules_Googleshopping</g:extension>
    <g:extension_version>1.6.8</g:extension_version>
    <g:store>emb</g:store>
    <g:url>https://www.xxxxx.com/</g:url>
    <g:products>1320</g:products>
    <g:generated>2020-06-11 11:18:32</g:generated>
    <g:processing_time>17.5007</g:processing_time>
  </config>
  <channel>
    <item>
      <g:id>20</g:id>
      <g:title>product 1</g:title>
      <g:description>description about product 1</g:description>
      <g:gtin>42662</g:gtin>
      <g:brand>company</g:brand>
      <g:mpn>0014</g:mpn>
      <g:link>link.html</g:link>
      <g:image_link>link/c/a/cat_21_16.jpg</g:image_link>
      <g:availability>in stock</g:availability>
      <g:condition>new</g:condition>
      <g:price>9</g:price>
      <g:shipping>
        <g:country>UAE</g:country>
        <g:service>DHL</g:service>
        <g:price>2.90</g:price>
      </g:shipping>
    </item>
  </channel>
</rss>    
'''

f = io.StringIO(xml)
tree = ET.parse(f)

ns = {'g': "http://base.google.com/ns/1.0"}

for node in tree.findall('.//item'):
    src = node.find('{http://base.google.com/ns/1.0}id')
    tgt = node.find('{http://base.google.com/ns/1.0}mpn')
    print('Node:', node)
    print('src:', src.text)
    print('mpn:', tgt.text)

Upvotes: 1

Related Questions