Reputation: 609
I have a list of college basketball teams in my database. The names match exactly as they're found on this site I'm trying to parse. The website is a big table. I want to select/match specific cells based on those teams in my database. I don't have a preference if it's css or xpath selection method. Whatever works. Here is a small sample of the return (added some linebreaks for visual purposes):
doc = Nokogiri::HTML(open("http://kenpom.com/"))
=> #<Nokogiri::XML::Element:0x3fd7f39c14e4 name="tbody"
children=[#<Nokogiri::XML::Text:0x3fd7f39c1174 "\n">, #<Nokogiri::XML::Element:0x3fd7f39c0fd0 name="tr"
children=[#<Nokogiri::XML::Element:0x3fd7f39c0cd8 name="td"
children=[#<Nokogiri::XML::Text:0x3fd7f39c0a6c "1">]>, #<Nokogiri::XML::Element:0x3fd7f39c0800 name="td" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39c0774 name="style" value="text-align:left;">]
children=[#<Nokogiri::XML::Element:0x3fd7f39c0224 name="a" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39c01c0 name="href" value="team.php?team=Kentucky">]
children=[#<Nokogiri::XML::Text:0x3fd7f39bdc40 "Kentucky">]>, #<Nokogiri::XML::Text:0x3fd7f39bda38 " ">, #<Nokogiri::XML::Element:0x3fd7f39bd984 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39bd90c name="class" value="seed">]
children=[#<Nokogiri::XML::Text:0x3fd7f39bd3d0 "1">]>]>, #<Nokogiri::XML::Element:0x3fd7f39bd100 name="td"
children=[#<Nokogiri::XML::Element:0x3fd7f39bcebc name="a" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39bce58 name="href" value="conf.php?c=SEC">]
children=[#<Nokogiri::XML::Text:0x3fd7f39bc994 "SEC">]>]>, #<Nokogiri::XML::Element:0x3fd7f39bc69c name="td"
children=[#<Nokogiri::XML::Text:0x3fd7f39bc480 "38-1">]>, #<Nokogiri::XML::Element:0x3fd7f39bc2b4 name="td"
children=[#<Nokogiri::XML::Text:0x3fd7f39bc070 ".9757">]>, #<Nokogiri::XML::Element:0x3fd7f39b9d34 name="td" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39b9cd0 name="class" value="divide">]
children=[#<Nokogiri::XML::Text:0x3fd7f39b967c "119.3">]>, #<Nokogiri::XML::Element:0x3fd7f39b93d4 name="td"
children=[#<Nokogiri::XML::Element:0x3fd7f39b9140 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39b90b4 name="class" value="seed">]
children=[#<Nokogiri::XML::Text:0x3fd7f39b8a10 "5">]>]>, #<Nokogiri::XML::Element:0x3fd7f39b86dc name="td"
children=[#<Nokogiri::XML::Text:0x3fd7f39b82a4 "86.5">]>, #<Nokogiri::XML::Element:0x3fd7f39b5fcc name="td"
children=[#<Nokogiri::XML::Element:0x3fd7f39b5db0 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39b5d24 name="class" value="seed">]
children=[#<Nokogiri::XML::Text:0x3fd7f39b57e8 "2">]>]>, #<Nokogiri::XML::Element:0x3fd7f39b54b4 name="td" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39b5450 name="class" value="divide">]
children=[#<Nokogiri::XML::Text:0x3fd7f39b4e38 "63.5">]>, #<Nokogiri::XML::Element:0x3fd7f39b4ab4 name="td"
children=[#<Nokogiri::XML::Element:0x3fd7f39b4820 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39b47bc name="class" value="seed">]
children=[#<Nokogiri::XML::Text:0x3fd7f39b4258 "251">]>]>, #<Nokogiri::XML::Element:0x3fd7f39b1ef4 name="td" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39b1e54 name="class" value="divide">]
children=[#<Nokogiri::XML::Text:0x3fd7f39b1904 "+.048">]>, #<Nokogiri::XML::Element:0x3fd7f39b1710 name="td"
children=[#<Nokogiri::XML::Element:0x3fd7f39b1314 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39b1288 name="class" value="seed">]
children=[#<Nokogiri::XML::Text:0x3fd7f39b0cfc "69">]>]>, #<Nokogiri::XML::Element:0x3fd7f39b0810 name="td" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39b0798 name="class" value="divide">]
children=[#<Nokogiri::XML::Text:0x3fd7f39add68 ".6829">]>, #<Nokogiri::XML::Element:0x3fd7f39adb88 name="td"
children=[#<Nokogiri::XML::Element:0x3fd7f39ad980 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39ad91c name="class" value="seed">]
children=[#<Nokogiri::XML::Text:0x3fd7f39ad430 "31">]>]>, #<Nokogiri::XML::Element:0x3fd7f39ad0ac name="td"
children=[#<Nokogiri::XML::Text:0x3fd7f39ace90 "106.0">]>, #<Nokogiri::XML::Element:0x3fd7f39acc9c name="td"
children=[#<Nokogiri::XML::Element:0x3fd7f39aca94 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39aca1c name="class" value="seed">]
children=[#<Nokogiri::XML::Text:0x3fd7f39ac5a8 "31">]>]>, #<Nokogiri::XML::Element:0x3fd7f39ac2b0 name="td"
children=[#<Nokogiri::XML::Text:0x3fd7f39ac0a8 "99.2">]>, #<Nokogiri::XML::Element:0x3fd7f39a9ed4 name="td"
children=[#<Nokogiri::XML::Element:0x3fd7f39a9c90 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39a9bc8 name="class" value="seed">]
children=[#<Nokogiri::XML::Text:0x3fd7f39a96f0 "29">]>]>, #<Nokogiri::XML::Element:0x3fd7f39a9394 name="td" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39a9308 name="class" value="divide">]
children=[#<Nokogiri::XML::Text:0x3fd7f39a8c78 ".5560">]>, #<Nokogiri::XML::Element:0x3fd7f39a8a5c name="td"
children=[#<Nokogiri::XML::Element:0x3fd7f39a8714 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fd7f39a864c name="class" value="seed">]
children=[#<Nokogiri::XML::Text:0x3fd7f39a8084 "100">]>]>]>, #<Nokogiri::XML::Text:0x3fd7f3c61cd0 "\n">, #<Nokogiri::XML::Element:0x3fd7f3c61b90 name="tr"
children=[#<Nokogiri::XML::Element:0x3fd7f3c61960 name="td"
children=[#<Nokogiri::XML::Text:0x3fd7f3c61708 "2">]>,
I have a team.name of "Kentucky" in my database, so I want to target the rank of Kentucky. How would I do that?
Rank: 1 is found at //*[@id="ratings-table"]/tbody[1]/tr[1]/td[1]
Team: Kentucky is found at //*[@id="ratings-table"]/tbody[1]/tr[1]/td[2]
How do I target/find the "Rank" by searching/using "Kentucky"? I'm interested in a few other columns but just this one example should explain the rest.
Thank you!
Upvotes: 0
Views: 167
Reputation: 89325
This is one possible XPath :
//*[@id="ratings-table"]/tbody/tr[contains(td[2],"Kentucky")]/td[1]
The XPath looks for tr
element having td[2]
child contains word "Kentucky"
, and then return the corresponding td[1]
child.
Alternatively, you can check for exact value of a
child of the td[2]
to find the target row, and then return the target column (td
) element :
//*[@id="ratings-table"]/tbody/tr[td[2]/a = "Kentucky"]/td[1]
Upvotes: 2