Reputation: 231
So I have a question related to a previous question, but I realized I needed to go one level more to get an 11-digit NDC code instead of a 10-digit NDC code. Rather than convert them later, I thought I could just grab them initially. Here is the link to the previous question. Is there a way to parse data from multiple pages from a parent webpage? And what I want to do is to click on the links here (which is the 2nd level btw)
And then grab the 11-digit NDC codes that results on the following page
I am able to write the code to get to the page, but I'm not sure how to select it. The number is in a tag and then a tag, but I just want the specific row in the table, so I thought I could get index it like this, but I'm getting a None Type and td throughout my list. Here is my code
import requests
from bs4 import BeautifulSoup
url ='https://ndclist.com/?s=Trospium'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
for a in soup.select('[data-title="NDC"] a[href]'):
link_url = a['href']
print('Processin link {}...'.format(link_url))
soup2 = BeautifulSoup(requests.get(link_url).content, 'html.parser')
for b in soup2.select('#product-packages a'):
link_url2 = b['href']
print('Processing link {}... '.format(link_url2))
soup3 = BeautifulSoup(requests.get(link_url2).content, 'html.parser')
for link in soup3.findAll('tr', limit=7)[1]:
print(link.name)
all_data.append(link.name)
print('Trospium')
print(all_data)
Upvotes: 2
Views: 221
Reputation: 195418
Just minor modifications to your code:
import requests
from bs4 import BeautifulSoup
url ='https://ndclist.com/?s=Trospium'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
for a in soup.select('[data-title="NDC"] a[href]'):
link_url = a['href']
print('Processing link {}...'.format(link_url))
soup2 = BeautifulSoup(requests.get(link_url).content, 'html.parser')
for b in soup2.select('#product-packages a'):
link_url2 = b['href']
print('\tProcessing link {}... '.format(link_url2))
soup3 = BeautifulSoup(requests.get(link_url2).content, 'html.parser')
ndc_billing_format = soup3.select_one('td:contains("11-Digit NDC Billing Format") + td').contents[0].strip()
print('\t\t{}'.format(ndc_billing_format))
all_data.append(ndc_billing_format)
print('Trospium')
print(all_data)
Prints:
Processing link https://ndclist.com/ndc/0574-0118...
Processing link https://ndclist.com/ndc/0574-0118/package/0574-0118-30...
00574011830
Processing link https://ndclist.com/ndc/0574-0145...
Processing link https://ndclist.com/ndc/0574-0145/package/0574-0145-60...
00574014560
Processing link https://ndclist.com/ndc/0591-3636...
Processing link https://ndclist.com/ndc/0591-3636/package/0591-3636-05...
00591363605
Processing link https://ndclist.com/ndc/0591-3636/package/0591-3636-30...
00591363630
Processing link https://ndclist.com/ndc/0591-3636/package/0591-3636-60...
00591363660
Processing link https://ndclist.com/ndc/23155-530...
Processing link https://ndclist.com/ndc/23155-530/package/23155-530-02...
23155053002
Processing link https://ndclist.com/ndc/23155-530/package/23155-530-05...
23155053005
Processing link https://ndclist.com/ndc/23155-530/package/23155-530-06...
23155053006
Processing link https://ndclist.com/ndc/42291-846...
Processing link https://ndclist.com/ndc/42291-846/package/42291-846-60...
42291084660
Processing link https://ndclist.com/ndc/60429-098...
Processing link https://ndclist.com/ndc/60429-098/package/60429-098-30...
60429009830
Processing link https://ndclist.com/ndc/60505-3454...
Processing link https://ndclist.com/ndc/60505-3454/package/60505-3454-5...
60505345405
Processing link https://ndclist.com/ndc/60505-3454/package/60505-3454-6...
60505345406
Processing link https://ndclist.com/ndc/60505-3454/package/60505-3454-8...
60505345408
Processing link https://ndclist.com/ndc/68001-228...
Processing link https://ndclist.com/ndc/68001-228/package/68001-228-04...
68001022804
Processing link https://ndclist.com/ndc/68462-461...
Processing link https://ndclist.com/ndc/68462-461/package/68462-461-05...
68462046105
Processing link https://ndclist.com/ndc/68462-461/package/68462-461-30...
68462046130
Processing link https://ndclist.com/ndc/68462-461/package/68462-461-60...
68462046160
Processing link https://ndclist.com/ndc/69097-912...
Processing link https://ndclist.com/ndc/69097-912/package/69097-912-02...
69097091202
Processing link https://ndclist.com/ndc/69097-912/package/69097-912-03...
69097091203
Processing link https://ndclist.com/ndc/69097-912/package/69097-912-15...
69097091215
Processing link https://ndclist.com/ndc/69150-258...
Processing link https://ndclist.com/ndc/69150-258/package/69150-258-06...
69150025806
Processing link https://ndclist.com/ndc/76282-336...
Processing link https://ndclist.com/ndc/76282-336/package/76282-336-60...
76282033660
Trospium
['00574011830', '00574014560', '00591363605', '00591363630', '00591363660', '23155053002', '23155053005', '23155053006', '42291084660', '60429009830', '60505345405', '60505345406', '60505345408', '68001022804', '68462046105', '68462046130', '68462046160', '69097091202', '69097091203', '69097091215', '69150025806', '76282033660']
Upvotes: 2