Alex
Alex

Reputation: 231

How to select a specific row from a table using BeautifulSoup?

So I have a question related to a previous question, but I realized I needed to go one level more to get an 11-digit NDC code instead of a 10-digit NDC code. Rather than convert them later, I thought I could just grab them initially. Here is the link to the previous question. Is there a way to parse data from multiple pages from a parent webpage? And what I want to do is to click on the links here (which is the 2nd level btw) 2nd level with 10 digit NDC codes

And then grab the 11-digit NDC codes that results on the following page

3rd level containing 11-digit NDC codes

I am able to write the code to get to the page, but I'm not sure how to select it. The number is in a tag and then a tag, but I just want the specific row in the table, so I thought I could get index it like this, but I'm getting a None Type and td throughout my list. Here is my code

import requests
from bs4 import BeautifulSoup    
url ='https://ndclist.com/?s=Trospium'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

all_data = []
for a in soup.select('[data-title="NDC"] a[href]'):
    link_url = a['href']
    print('Processin link {}...'.format(link_url))

    soup2 = BeautifulSoup(requests.get(link_url).content, 'html.parser')
    for b in soup2.select('#product-packages a'):
        link_url2 = b['href']
        print('Processing link {}... '.format(link_url2))
        soup3 = BeautifulSoup(requests.get(link_url2).content, 'html.parser')
        for link in soup3.findAll('tr', limit=7)[1]:
            print(link.name)
            all_data.append(link.name)

print('Trospium')
print(all_data)

Upvotes: 2

Views: 221

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195418

Just minor modifications to your code:

import requests
from bs4 import BeautifulSoup
url ='https://ndclist.com/?s=Trospium'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

all_data = []
for a in soup.select('[data-title="NDC"] a[href]'):
    link_url = a['href']
    print('Processing link {}...'.format(link_url))

    soup2 = BeautifulSoup(requests.get(link_url).content, 'html.parser')
    for b in soup2.select('#product-packages a'):
        link_url2 = b['href']
        print('\tProcessing link {}... '.format(link_url2))
        soup3 = BeautifulSoup(requests.get(link_url2).content, 'html.parser')
        ndc_billing_format = soup3.select_one('td:contains("11-Digit NDC Billing Format") + td').contents[0].strip()
        print('\t\t{}'.format(ndc_billing_format))
        all_data.append(ndc_billing_format)

print('Trospium')
print(all_data)

Prints:

Processing link https://ndclist.com/ndc/0574-0118...
    Processing link https://ndclist.com/ndc/0574-0118/package/0574-0118-30... 
        00574011830
Processing link https://ndclist.com/ndc/0574-0145...
    Processing link https://ndclist.com/ndc/0574-0145/package/0574-0145-60... 
        00574014560
Processing link https://ndclist.com/ndc/0591-3636...
    Processing link https://ndclist.com/ndc/0591-3636/package/0591-3636-05... 
        00591363605
    Processing link https://ndclist.com/ndc/0591-3636/package/0591-3636-30... 
        00591363630
    Processing link https://ndclist.com/ndc/0591-3636/package/0591-3636-60... 
        00591363660
Processing link https://ndclist.com/ndc/23155-530...
    Processing link https://ndclist.com/ndc/23155-530/package/23155-530-02... 
        23155053002
    Processing link https://ndclist.com/ndc/23155-530/package/23155-530-05... 
        23155053005
    Processing link https://ndclist.com/ndc/23155-530/package/23155-530-06... 
        23155053006
Processing link https://ndclist.com/ndc/42291-846...
    Processing link https://ndclist.com/ndc/42291-846/package/42291-846-60... 
        42291084660
Processing link https://ndclist.com/ndc/60429-098...
    Processing link https://ndclist.com/ndc/60429-098/package/60429-098-30... 
        60429009830
Processing link https://ndclist.com/ndc/60505-3454...
    Processing link https://ndclist.com/ndc/60505-3454/package/60505-3454-5... 
        60505345405
    Processing link https://ndclist.com/ndc/60505-3454/package/60505-3454-6... 
        60505345406
    Processing link https://ndclist.com/ndc/60505-3454/package/60505-3454-8... 
        60505345408
Processing link https://ndclist.com/ndc/68001-228...
    Processing link https://ndclist.com/ndc/68001-228/package/68001-228-04... 
        68001022804
Processing link https://ndclist.com/ndc/68462-461...
    Processing link https://ndclist.com/ndc/68462-461/package/68462-461-05... 
        68462046105
    Processing link https://ndclist.com/ndc/68462-461/package/68462-461-30... 
        68462046130
    Processing link https://ndclist.com/ndc/68462-461/package/68462-461-60... 
        68462046160
Processing link https://ndclist.com/ndc/69097-912...
    Processing link https://ndclist.com/ndc/69097-912/package/69097-912-02... 
        69097091202
    Processing link https://ndclist.com/ndc/69097-912/package/69097-912-03... 
        69097091203
    Processing link https://ndclist.com/ndc/69097-912/package/69097-912-15... 
        69097091215
Processing link https://ndclist.com/ndc/69150-258...
    Processing link https://ndclist.com/ndc/69150-258/package/69150-258-06... 
        69150025806
Processing link https://ndclist.com/ndc/76282-336...
    Processing link https://ndclist.com/ndc/76282-336/package/76282-336-60... 
        76282033660
Trospium
['00574011830', '00574014560', '00591363605', '00591363630', '00591363660', '23155053002', '23155053005', '23155053006', '42291084660', '60429009830', '60505345405', '60505345406', '60505345408', '68001022804', '68462046105', '68462046130', '68462046160', '69097091202', '69097091203', '69097091215', '69150025806', '76282033660']

Upvotes: 2

Related Questions