Adam
Adam

Reputation: 325

Finding first row of a table with Beautiful Soup

I'm working on an assignment for class. I need to write something that will return the first row in the table on this webpage (the Barr v. Lee) row: https://www.supremecourt.gov/opinions/slipopinion/19

I've seen other questions that some might consider similar. But they don't look like they're answering my same question. Most other questions it looks like they already have the table on head, rather than pulling it down from a website already.

Or, maybe I just can't see the resemblance. I've been scraping for about a week now.

Right now, I'm trying to build a loop that will go through all the div elements with an increment counter, and have the counter return a number that tells what the div is for that row so I can drill into it.

This is what I have so far:

for divs in soup_doc:
div_counter = 0
soup_doc.find_all('div')[div_counter]
div_counter = div_counter + 1
print(div_counter)

But right now, it's only returning 1 which I know isn't right. What should I do to fix this? Or is there a better way to go about getting this information?

My output should be:

63
7/14/20
20A8
Barr v. Lee

PC
591/2

Upvotes: 2

Views: 1243

Answers (2)

MendelG
MendelG

Reputation: 20018

To get the first row, you can use a CSS Selector .in tr:nth-of-type(2) td:

import requests
from bs4 import BeautifulSoup

URL = "https://www.supremecourt.gov/opinions/slipopinion/19"

soup = BeautifulSoup(requests.get(URL).content, "html.parser")

for tag in soup.select('.in tr:nth-of-type(2) td'):
    print(tag.text)

Output:

63
7/14/20
20A8
Barr v. Lee
 
PC
591/2

Upvotes: 1

Patrik
Patrik

Reputation: 499

In your example the div_counter = 0 has to go in front of your loop like this:

div_counter = 0
for divs in soup_doc:
  soup_doc.find_all('div')[div_counter]
  div_counter = div_counter + 1
  print(div_counter)

You always get 1 because you set div_counter to 0 inside of you for-loop at a beginning of each iteration and than add 1.

Upvotes: 1

Related Questions