Lakshya Srivastava
Lakshya Srivastava

Reputation: 699

Convert Scraped Data to Dictionary

I have a XML file, and after I run my Beautiful soup findAll("named-query"), and print it out, I get Result like this:

<named-query name="sdfsdfsdf">
        <query>
            ---Query here...--
        </query>
</named-query>

<named-query name="xkjlias">
        <query>
          ---Query here...--
        </query>
</named-query>
   .
   .
   .

Is there a way I can convert this into dictionary, or json, or csv like:

name="sdfsdfsdf" query = ....

name="xkjlias" query = ....

Thanks in advance.

Upvotes: 0

Views: 178

Answers (2)

kyungmin
kyungmin

Reputation: 484

Code:

import json

from bs4 import BeautifulSoup


text = """
<named-query name="sdfsdfsdf">
    <query>
        ---Query here...--
    </query>
</named-query>

<named-query name="xkjlias">
    <query>
        ---Query here2...--
    </query>
</named-query>"""


soup = BeautifulSoup(text, 'html.parser')
queries = {nq.attrs['name']: nq.text.strip() for nq in soup.find_all('named-query')}
queries_json = json.dumps(queries)

print(queries)  # dict
print(queries_json)  # json

Output:

{'sdfsdfsdf': '---Query here...--', 'xkjlias': '---Query here2...--'}
{"sdfsdfsdf": "---Query here...--", "xkjlias": "---Query here2...--"}

Upvotes: 1

Shaunak Sen
Shaunak Sen

Reputation: 578

Try this:

# initialize a dictionary
data = {}

# for each tag 'named-query 
for named_query in soup.findAll('named-query'):
        # get the value of name attribute and store it in a dict
        data['name'] = named_query.attrs['name']
        # traverse its children
        for child in named_query.children:
                # check for '\n' and empty strings
                if len(child.string.strip()) > 0:
                        data['query'] = child.string.strip()
print (data)

>>> {'name': 'sdfsdfsdf', 'query': '---Query here...--'}

Upvotes: 1

Related Questions