sathis
sathis

Reputation: 115

Find parent Tag based on multiple tag text - BeautifulSoup

Find parent Tag based on multiple tag text

Consider I have portion of xml in file as follows:

<Client name="Jack">
        <Type>premium</Type>
        <Usage>unlimited</Usage>
        <Payment>online</Payment>
</Client>

<Client name="Jill">
        <Type>demo</Type>
        <Usage>limited</Usage>
        <Payment>online</Payment>
</Client>

<Client name="Ross">
        <Type>premium</Type>
        <Usage>unlimited</Usage>
        <Payment>online</Payment>
</Client>

I am using BeautifulSoup for parsing the values.

Here I need to get the client name based on the tag , Based on the tag 's text, I need to get the client name.(From parent tag).

I have function for same as follows:

def get_client_for_usage(self, usage):
    """
    To get the client name for specified usage
    """
    usage_items = self.parser.findAll("client")
    client_for_usage = []
    for usages in usage_items:
        try:
            client_set = usages.find("usage", text=usage).findParent("client")
            client_attr = dict(client_set.attrs)
            client_name = client_attr[u'name']
            client_for_usage.append(client_name)

        except AttributeError:
            continue
    return client_for_usage

Now I need to get the client name but based on two things, that is based on both Usage and Type.

So I need to pass both the type and usage, So that I could get the client name.

Some one help me with the same. If question is not clear please let me know so that I can edit as needed.

Upvotes: 2

Views: 1028

Answers (2)

宏杰李
宏杰李

Reputation: 12158

html = '''<Client name="Jack">
        <Type>premium</Type>
        <Usage>unlimited</Usage>
        <Payment>online</Payment>
</Client>

<Client name="Jill">
        <Type>demo</Type>
        <Usage>limited</Usage>
        <Payment>online</Payment>
</Client>

<Client name="Ross">
        <Type>premium</Type>
        <Usage>unlimited</Usage>
        <Payment>online</Payment>
</Client>'''


import bs4 
import collections

soup = bs4.BeautifulSoup(html, 'lxml')
d = collections.defaultdict(list)
for client in soup('client'):
    type_, usage, payment = client.stripped_strings
    d[(type_, usage)].append(client['name'])

out:

defaultdict(list,
            {('demo', 'limited'): ['Jill'],
             ('premium', 'unlimited'): ['Jack', 'Ross']})

use type and usage as key and the client name as value to construct a dict, than you can get the name by access the key.

Upvotes: 0

rafi wiener
rafi wiener

Reputation: 607

something like

def get_client_for_usage(self, usage, tpe):
    """
    To get the client name for specified usage
    """
    usage_items = self.parser.findAll("client")
    client_for_usage = []
    for usages in usage_items:
        try:
            client_set = usages.find("usage", text=usage).findParent("client")
            typ_node = usages.find("type", text=tpe).findParent("client")
            if client_set == typ_node:
                client_for_usage.append(client_set['name'])
        except AttributeError:
            continue
    return client_for_usage

Upvotes: 1

Related Questions