Reputation: 59
everybody, this is probably has a simple answer which I am overlooking (I am still learning).
I am trying to scrape data from a website, I am specific after particular p elements, which are nested inside different elements, this is what the nested elements look like.
#ctl00_body_divSearchResult > div:nth-child(5) > div.expandable-box-content.expanded > p:nth-child(2)
#ctl00_body_divSearchResult > div:nth-child(16) > div.expandable-box-content.expanded > p:nth-child(2)
#ctl00_body_divSearchResult > div:nth-child(27) > div.expandable-box-content.expanded > p:nth-child(2)
#ctl00_body_divSearchResult > div:nth-child(38) > div.expandable-box-content.expanded > p:nth-child(2)
#ctl00_body_divSearchResult > div:nth-child(49) > div.expandable-box-content.expanded > p:nth-child(2)
Here are five examples from the same page, the first div:nth-child has different numbers, but the rest is consistent. I am after the individual p:nth-child(2) elements.
Using this code I can get the individual p elements,
numbers= agent.get(urlanzctr).css('#ctl00_body_divSearchResult > div:nth-child(5) > div.expandable-box-content > p:nth-child(2)').text
But I think it would be sloppy coding to go through and repeat this for each individual instances.
Upvotes: 0
Views: 395
Reputation: 26758
Your agent.get(url).css(selector)
approach is right to return an array.
Looking at your selectors, they're all of this structure:
#ctl00_body_divSearchResult >
div:nth-child(N) >
div.expandable-box-content.expanded >
p:nth-child(2)
The only variable being the N
in div:nth-child
.
You have the values 5, 16, 27, 38, 49
which is 11x + 5
So you could do something like this
def make_selector(n)
"#ctl00_body_divSearchResult > " +
"div:nth-child(#{n}) > " +
"div.expandable-box-content.expanded > " +
"p:nth-child(2)"
end
def get_matches(n)
agent.get(url).css(make_selector(n))
end
starting_idx = 5
current_matches = get_matches(starting_idx)
all_matches = []
until current_matches.empty?
all_matches.concat(current_matches)
current_matches = get_matches(starting_idx + 11)
end
puts all_matches.length
You might also be able to skip out the intermediary selectors
i.e. maybe just .expandable-box-content.exampled > p
would work; i have no idea what the page structure looks like.
Upvotes: 1