evkline
evkline

Reputation: 1511

How do I extract data from Nokogiri XML Document? Trying to use XPath unsuccessfully

I'm using the Vacuum gem in a Rails app to pull down data from Amazon's Product API. I'm getting back an Excon response. For a search for Books with a keyword of Ruby I get the following string when I call res.body:

<?xml version="1.0" ?>
<ItemSearchResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01">
  <OperationRequest>
    <HTTPHeaders>
      <Header Name="UserAgent" Value="Jeff/1.0.1 (Language=Ruby; new-host-2.home)"></Header>
    </HTTPHeaders>
    <RequestId>fa6e6962-15b0-4da6-abf2-12a688820dd3</RequestId>
    <Arguments>
      <Argument Name="Operation" Value="ItemSearch"></Argument>
      <Argument Name="Service" Value="AWSECommerceService"></Argument>
      <Argument Name="ItemPage" Value="1"></Argument>
      <Argument Name="AssociateTag" Value="thestu0f-20"></Argument>
      <Argument Name="Version" Value="2011-08-01"></Argument>
      <Argument Name="Keywords" Value="Ruby"></Argument>
      <Argument Name="SignatureMethod" Value="HmacSHA256"></Argument>
      <Argument Name="SearchIndex" Value="Books"></Argument>
      <Argument Name="SignatureVersion" Value="2"></Argument>
      <Argument Name="Signature" Value="05pqRqRK6DBFuOcXRhQvMO0XOj2b8a1bnMi5eB07fjs="></Argument>
      <Argument Name="AWSAccessKeyId" Value="AKIAI25J7QK5VYQ7HTJQ"></Argument>
      <Argument Name="Timestamp" Value="2013-12-27T06:37:09Z"></Argument>
    </Arguments>
    <RequestProcessingTime>0.2768830000000000</RequestProcessingTime>
  </OperationRequest>
  <Items>
    <Request>
      <IsValid>True</IsValid>
      <ItemSearchRequest>
        <ItemPage>1</ItemPage>
        <Keywords>Ruby</Keywords>
        <ResponseGroup>Small</ResponseGroup>
        <SearchIndex>Books</SearchIndex>
      </ItemSearchRequest>
    </Request>
    <TotalResults>19360</TotalResults>
    <TotalPages>1936</TotalPages>
    <MoreSearchResultsUrl>http://www.amazon.com/gp/redirect.html?camp=2025&amp;creative=386001&amp;location=http%3A%2F%2Fwww.amazon.com%2Fgp%2Fsearch%3Fkeywords%3DRuby%26url%3Dsearch-alias%253Dstripbooks&amp;linkCode=xm2&amp;tag=thestu0f-20&amp;SubscriptionId=AKIAI25J7QK5VYQ7HTJQ</MoreSearchResultsUrl>
    <Item>
      <ASIN>0596516177</ASIN>
      <DetailPageURL>http://www.amazon.com/Ruby-Programming-Language-David-Flanagan/dp/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0596516177</DetailPageURL>
      <ItemLinks>
        <ItemLink>
          <Description>Technical Details</Description>
          <URL>http://www.amazon.com/Ruby-Programming-Language-David-Flanagan/dp/tech-data/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Baby Registry</Description>
          <URL>http://www.amazon.com/gp/registry/baby/add-item.html%3Fasin.0%3D0596516177%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wedding Registry</Description>
          <URL>http://www.amazon.com/gp/registry/wedding/add-item.html%3Fasin.0%3D0596516177%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wishlist</Description>
          <URL>http://www.amazon.com/gp/registry/wishlist/add-item.html%3Fasin.0%3D0596516177%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>Tell A Friend</Description>
          <URL>http://www.amazon.com/gp/pdp/taf/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Customer Reviews</Description>
          <URL>http://www.amazon.com/review/product/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Offers</Description>
          <URL>http://www.amazon.com/gp/offer-listing/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
      </ItemLinks>
      <ItemAttributes>
        <Author>David Flanagan</Author>
        <Author>Yukihiro Matsumoto</Author>
        <Manufacturer>O'Reilly Media</Manufacturer>
        <ProductGroup>Book</ProductGroup>
        <Title>The Ruby Programming Language</Title>
      </ItemAttributes>
    </Item>
    <Item>
      <ASIN>1937785491</ASIN>
      <DetailPageURL>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D1937785491</DetailPageURL>
      <ItemLinks>
        <ItemLink>
          <Description>Technical Details</Description>
          <URL>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/tech-data/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Baby Registry</Description>
          <URL>http://www.amazon.com/gp/registry/baby/add-item.html%3Fasin.0%3D1937785491%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wedding Registry</Description>
          <URL>http://www.amazon.com/gp/registry/wedding/add-item.html%3Fasin.0%3D1937785491%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wishlist</Description>
          <URL>http://www.amazon.com/gp/registry/wishlist/add-item.html%3Fasin.0%3D1937785491%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>Tell A Friend</Description>
          <URL>http://www.amazon.com/gp/pdp/taf/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Customer Reviews</Description>
          <URL>http://www.amazon.com/review/product/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Offers</Description>
          <URL>http://www.amazon.com/gp/offer-listing/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
      </ItemLinks>
      <ItemAttributes>
        <Author>Dave Thomas</Author>
        <Author>Andy Hunt</Author>
        <Author>Chad Fowler</Author>
        <Manufacturer>Pragmatic Bookshelf</Manufacturer>
        <ProductGroup>Book</ProductGroup>
        <Title>Programming Ruby 1.9 &amp; 2.0: The Pragmatic Programmers' Guide (The Facets of Ruby)</Title>
      </ItemAttributes>
    </Item>
    ...
    <Item>
      <ASIN>1937785564</ASIN>
      <DetailPageURL>http://www.amazon.com/Agile-Development-Rails-Facets-Ruby/dp/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D1937785564</DetailPageURL>
      <ItemLinks>
        <ItemLink>
          <Description>Technical Details</Description>
          <URL>http://www.amazon.com/Agile-Development-Rails-Facets-Ruby/dp/tech-data/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Baby Registry</Description>
          <URL>http://www.amazon.com/gp/registry/baby/add-item.html%3Fasin.0%3D1937785564%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wedding Registry</Description>
          <URL>http://www.amazon.com/gp/registry/wedding/add-item.html%3Fasin.0%3D1937785564%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wishlist</Description>
          <URL>http://www.amazon.com/gp/registry/wishlist/add-item.html%3Fasin.0%3D1937785564%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>Tell A Friend</Description>
          <URL>http://www.amazon.com/gp/pdp/taf/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Customer Reviews</Description>
          <URL>http://www.amazon.com/review/product/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Offers</Description>
          <URL>http://www.amazon.com/gp/offer-listing/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
      </ItemLinks>
      <ItemAttributes>
        <Author>Sam Ruby</Author>
        <Author>Dave Thomas</Author>
        <Author>David Heinemeier Hansson</Author>
        <Manufacturer>Pragmatic Bookshelf</Manufacturer>
        <ProductGroup>Book</ProductGroup>
        <Title>Agile Web Development with Rails 4 (Facets of Ruby)</Title>
      </ItemAttributes>
    </Item>
  </Items>
</ItemSearchResponse>

Next I tried creating an XML Document with:

xml_doc = Nokogiri::XML(res.body)

and get the following:

#<Nokogiri::XML::Document:0x3fcc4b3e8f94 name="document" children=[#<Nokogiri::XML::Element:0x3fcc4b3e8ae4 name="ItemSearchResponse" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Element:0x3fcc4b043074 name="OperationRequest" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Element:0x3fcc4b042c50 name="HTTPHeaders" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Element:0x3fcc4b04282c name="Header" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> attributes=[#<Nokogiri::XML::Attr:0x3fcc4b0427c8 name="Name" value="UserAgent">, #<Nokogiri::XML::Attr:0x3fcc4b0427b4 name="Value" value="Jeff/1.0.1 (Language=Ruby; new-host-2.home)">]>]>, #<Nokogiri::XML::Element:0x3fcc4b041bfc name="RequestId" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Text:0x3fcc4ade5c64 "fa6e6962-15b0-4da6-abf2-12a688820dd3">]>, #<Nokogiri::XML::Element:0x3fcc4ade5944 name="Arguments" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Element:0x3fcc4ade5264 name="Argument" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> attributes=[#<Nokogiri::XML::Attr:0x3fcc4ade5200 name="Name" value="Operation">, #<Nokogiri::XML::Attr:0x3fcc4ade51ec name="Value" value="ItemSearch">]>

I had to cut the document short to fit in this question. I am trying to execute different XPath parses on this document and keep getting empty arrays as return values. I've read the tutorial on Zeno and W3, and I'm just very confused about what I'm supposed to be doing. All I want from the response is the book title and author.

Any help with where to start or an example of how to parse this data correctly would be greatly appreciated. Also, is it best practice to use XPath to parse the Nokogiri XML Doc, or CSS? There is an option to turn the response into a hash, if I were to choose that, would parsing that be any easier? Are there hash parsers available? Thanks!

NOTE

I am currently using this method of extracting the results from the request:

req = Vaccuum.new

req.configure(
  aws_access_key_id: ENV["S3_ACCESS_KEY"], 
  aws_secret_access_key: ENV["S3_SECRET_KEY"],
  associate_tag: ENV["AMAZON_ASSOCIATE_TAG"]
)

params {
  'SearchIndex' => 'Books',
  'Keywords'    => 'Keywords',
  'ItemPage'    => 1
}

item_search_res = req.item_search(params)

xml_doc = Nokogiri::XML(item_search_res.body)

asins   = xml_doc.search('ASIN').map   { |n| n.children.text }
authors = xml_doc.search('Author').map { |n| n.children.text }
titles  = xml_doc.search('Title').map  { |n| n.children.text }

Upvotes: 1

Views: 897

Answers (1)

JLRishe
JLRishe

Reputation: 101680

What are some of the XPaths that you've tried?

Since the source document uses namespaces, you need to either declare and use namespace prefixes:

doc.xpath("/az:ItemSearchResponse/az:Items/az:Item", "az" => "http://webservices.amazon.com/AWSECommerceService/2011-08-01")

Or you can remove the namespaces before querying the document:

doc.remove_namespaces!
doc.xpath("/ItemSearchResponse/Items/Item")

Upvotes: 3

Related Questions