BeautifulSoup missing part of tag

Question

I am processing XML feed with BeautifulSoup, but from some reason it is skipping part of param tag. I allready tried to change the parser (html.parser / html5lib / lxml), but all have same output.

Can somene help with this?

Original XML file:


 DK28-SLV
 
  Způsob komunikace
  WiFi pro internetové připojení

Output from BeautifulSoup:


 DK28-SLV 
 Způsob komunikace
 WiFi pro internetové připojení

Desired output:


 DK28-SLV 
       -------> This one is missing
  Způsob komunikace
  WiFi pro internetové připojení

My code:

from bs4 import BeautifulSoup
import requests

source = requests.get("my-xml-feed-url").text
soup = BeautifulSoup(source, "lxml")

product = soup.find("shopitem")


for product in soup.find_all("shopitem"):
    productno = product.find("productno")
    print(productno)
    param = product.find("param")
    print(param)
    param_name = product.find("param_name")
    print(param_name)
    param_val = product.find("val")
    print(param_val)

UPDATE: after testing to change parser to "xml".

It partly helped, and tag is now shown correctly. But XML file is now corrupted on different place. It seems that from approx. 1/2 of XML it is OK, but first 1/2 of XML is not shown..

Original XML:


Funkce alarmu
Ano, do mobilní aplikace

Output begining:

/PARAM_NAME> 
Ano, do mobilní aplikace

This is where output starts.. so from some reason the part of XML before this part is cut off. It seems that there is nothing different in XML structure before and after this point. so I see no reason for this.

Further output is OK:


   
    Úhel záběru
   
   
    60°

BeautifulSoup missing part of tag

Answers (1)

Related Questions