frenchloaf
frenchloaf

Reputation: 1054

Grabbing the text value of an XML node + Nokogiri and xpath

I have built a rake file to insert all of the information I grab about a certain into my database. This is all working, but the values for my keys are not being populated with any data. Am I possibly making my at_xpath calls incorrectly? I'll post an example below --

information = {
            "street_address" => property.at_xpath("/Address/AddressLine1/text()"),
            "city" => property.at_xpath("/Address/City/text()"),
            "zipcode" => property.at_xpath("/Address/PostalCode/text()"),
            "short_description" => property.at_xpath("/Information/ShortDescription/text()"),
            "long_description" => property.at_xpath("Information/LongDescription/text()"),
            "rent" => property.at_xpath("/Information/Rents/StandardRent/text()"),
            "application_fee" => property.at_xpath("/Fee/ApplicationFee/text()"),
            "bedrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bedroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bathroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/ILS_Unit/Availability/VacancyClass/text()")
        }

I know everything is working perfectly aside from putting the data into the actual value spaces in the hash listed above. I also know that nokogiri and xpath are working properly as I have narrowed down the number of s down from 33,000+ to 1,068.

Any guidance would be super appreciated! Thank you :)

========================= UPDATE ============================

I thought seeing the whole loop might help add clarity --

doc.xpath("//Property/PropertyID/Identification[@OrganizationName='northsteppe']").each do |property|

        # GATHER EACH PROPERTY'S INFORMATION
        information = {
            "street_address" => property.at_xpath("/Address/AddressLine1/text()"),
            "city" => property.at_xpath("/Address/City/text()"),
            "zipcode" => property.at_xpath("/Address/PostalCode/text()"),
            "short_description" => property.at_xpath("/Information/ShortDescription/text()"),
            "long_description" => property.at_xpath("Information/LongDescription/text()"),
            "rent" => property.at_xpath("/Information/Rents/StandardRent/text()"),
            "application_fee" => property.at_xpath("/Fee/ApplicationFee/text()"),
            "bedrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bedroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bathroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/ILS_Unit/Availability/VacancyClass/text()")
        }


        # CREATE NEW PROPERTY WITH INFORMATION HASH CREATED ABOVE
        if Property.create!(information)
            puts "yay!"
        else
            puts "oh no! this sucks!"
        end

    end # ENDS XPATH EACH LOOP

============================ ANOTHER UPDATE ==========================

so I tried swapping out the "/text()" at the end of each at_xpath path with "/inner_text()" and received the following error --

rake aborted! Invalid expression: /Address/AddressLine1/inner_text()

I then tried switching my "at_xpath" calls to "at_css" calls and doing something like --

"street_address" => property.at_css(".AddressLine1").text

but recieved the following error --

rake aborted! undefined method `text' for nil:NilClass

============================= UPDATE TO SHOW XML ===========================

<Property IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
  <PropertyID>
    <Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/>
    <Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>
    <MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
    <WebSite>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</WebSite>
    <Address AddressType="property">
      <Description>Address of Available Listing</Description>
      <AddressLine1>1689 N 4th St </AddressLine1>
      <City>Columbus</City>
      <State>OH</State>
      <PostalCode>43201</PostalCode>
      <Country>US</Country>
    </Address>
    <Phone PhoneType="office">
      <PhoneNumber>(614) 299-4110</PhoneNumber>
    </Phone>
    <Email>[email protected]</Email>
  </PropertyID>
  <ILS_Identification ILS_IdentificationType="Apartment" RentalType="Market Rate">
    <Latitude>39.997694</Latitude>
    <Longitude>-82.99903</Longitude>
    <LastUpdate Month="11" Day="11" Year="2013"/>
  </ILS_Identification>
  <Information>
    <StructureType>Standard</StructureType>
    <UnitCount>1</UnitCount>
    <ShortDescription>Spacious House Central Campus OSU, available fall</ShortDescription>
    <LongDescription>One of our favorites! This great house is perfect for students or a single family. With huge living and sleeping rooms, there is plenty of space. The kitchen is totally modernized with new appliances, and the bathroom has been updated. Natural woodwork and brick accents are seen within the house, and the decorative mantles. Ceiling fans and mini-blinds are included, as well as a FREE stack washer and dryer. The front and side deck. On site parking available.</LongDescription>
    <Rents>
      <StandardRent>2000.00</StandardRent>
    </Rents>
    <PropertyAvailabilityURL>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</PropertyAvailabilityURL>
  </Information>
  <Fee>
    <ProrateType>Standard</ProrateType>
    <LateType>Standard</LateType>
    <LatePercent>0</LatePercent>
    <LateMinFee>0</LateMinFee>
    <LateFeePerDay>0</LateFeePerDay>
    <NonRefundableHoldFee>0</NonRefundableHoldFee>
    <AdminFee>0</AdminFee>
    <ApplicationFee>30.00</ApplicationFee>
    <BrokerFee>0</BrokerFee>
  </Fee>
  <Deposit DepositType="Security Deposit">
    <Amount AmountType="Actual">
      <ValueRange Exact="2000.00" Currency="USD"/>
    </Amount>
  </Deposit>
  <Policy>
    <Pet Allowed="false"/>
  </Policy>
  <Phase IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Name/>
    <Description/>
    <UnitCount>1</UnitCount>
    <RentableUnits>1</RentableUnits>
    <TotalSquareFeet>0</TotalSquareFeet>
    <RentableSquareFeet>0</RentableSquareFeet>
  </Phase>
  <Building IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Name/>
    <Description/>
    <UnitCount>1</UnitCount>
    <SquareFeet>0</SquareFeet>
  </Building>
  <Floorplan IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Name/>
    <UnitCount>1</UnitCount>
    <Room RoomType="Bedroom">
      <Count>4</Count>
      <Comment/>
    </Room>
    <Room RoomType="Bathroom">
      <Count>1</Count>
      <Comment/>
    </Room>
    <SquareFeet Min="0" Max="0"/>
    <MarketRent Min="2000" Max="2000"/>
    <EffectiveRent Min="2000" Max="2000"/>
  </Floorplan>
  <ILS_Unit IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Units>
      <Unit>
        <Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="UL Portfolio"/>
        <MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
        <UnitBedrooms>4</UnitBedrooms>
        <UnitBathrooms>1.0</UnitBathrooms>
        <MinSquareFeet>0</MinSquareFeet>
        <MaxSquareFeet>0</MaxSquareFeet>
        <SquareFootType>internal</SquareFootType>
        <UnitRent>2000.00</UnitRent>
        <MarketRent>2000.00</MarketRent>
        <Address AddressType="property">
          <AddressLine1>1689 N 4th St </AddressLine1>
          <City>Columbus</City>
          <PostalCode>43201</PostalCode>
          <Country>US</Country>
        </Address>
      </Unit>
    </Units>
    <Availability>
      <VacateDate Month="7" Day="23" Year="2014"/>
      <VacancyClass>Unoccupied</VacancyClass>
      <MadeReadyDate Month="7" Day="23" Year="2014"/>
    </Availability>
    <Amenity AmenityType="Other">
      <Description>All new stainless steel appliances!  Refinished hardwood floors</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>Ceramic tile</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>Ceiling fans</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>Wrap-around porch</Description>
    </Amenity>
    <Amenity AmenityType="Dryer">
      <Description>Free Washer and Dryer</Description>
    </Amenity>
    <Amenity AmenityType="Washer">
      <Description>Free Washer and Dryer</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>off-street parking available</Description>
    </Amenity>
  </ILS_Unit>
  <File Active="true" FileID="820982141">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/31077069-6e81-4373-8a89-508c57585543/medium.jpg</Src>
    <Width>360</Width>
    <Height>300</Height>
    <Rank>1</Rank>
  </File>
  <File Active="true" FileID="820982145">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/84e1be40-96fd-4717-b75d-09b39231a762/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>2</Rank>
  </File>
  <File Active="true" FileID="820982149">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/cd419635-c37f-4676-a43e-c72671a2a748/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>3</Rank>
  </File>
  <File Active="true" FileID="820982152">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/6b68dbd5-2cde-477c-99d7-3ca33f03cce8/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>4</Rank>
  </File>
  <File Active="true" FileID="820982155">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/17b6c7c0-686c-4e46-865b-11d80744354a/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>5</Rank>
  </File>
  <File Active="true" FileID="820982157">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/3545ac8b-471f-404a-94b2-fcd00dd16e25/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>6</Rank>
  </File>
  <File Active="true" FileID="820982160">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/02471172-2183-4bf1-a3d7-33415f902c1c/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>7</Rank>
  </File>
</Property>

Upvotes: 0

Views: 2503

Answers (2)

Phlip
Phlip

Reputation: 5343

Your first XPath is too deep. It returns an Identification where you need a PropertyID. Try this:

doc.xpath("//Property/PropertyID[ Identification/@OrganizationName = 'northsteppe' ]").each do |property|
    # GATHER EACH PROPERTY'S INFORMATION
    information = {
        "street_address" => property.at_xpath("Address/AddressLine1/text()").to_s,
        "city" => property.at_xpath("Address/City/text()").to_s,
        "zipcode" => property.at_xpath("Address/PostalCode/text()").to_s
        }
    p information
end

Upvotes: 2

the Tin Man
the Tin Man

Reputation: 160631

In your loop you do:

doc.xpath("//Property/PropertyID/Identification[@OrganizationName='northsteppe']").each do |property|

Then, for your values you do things like:

property.at_xpath("/Address/AddressLine1/text()")

You can't use /Address/AddressLine1/text() relative to property with XPath.

Nokogiri will search for /Address/AddressLine1/text(), which means, start at the absolute path, which would be starting from the top of the document /, find the Address node immediately below it, find the AddressLine1 node under it....

Instead use:

Address/AddressLine1/text()

Which means search relative to property and results in the full XPath:

//Property/PropertyID/Identification[@OrganizationName='northsteppe']/Address/AddressLine1/text()

Looking at the XML you added...

The paths you want don't exist. Looking at it in PRY:

[16] (pry) main: 0> puts doc.xpath("//Property/PropertyID/Identification[@OrganizationName='northsteppe']").to_xml
<Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/><Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>

Neither of the property nodes have children. Only the node for property exists, so all the values you're looking for, which are child-nodes, aren't there.

Instead, it looks like you want to find the Property node and work downward:

Upvotes: 1

Related Questions