user1023627
user1023627

Reputation: 183

How to recursively delete empty child elements at a specific xpath location in an XML using Nokogiri?

I have the below XML, where i have few child elements with empty text.

doc = <<'XML'
<Book>
    <BookId>BK45647</BookId>
    <BookName>The Client by John Grisham</BookName>
    <BookAuthenticationCode></BookAuthenticationCode>
    <BookCategory>Suspense</BookCategory>
    <BookSequence></BookSequence>
    <BookPublisherInfo>
        <PublisherId>PBBK12345</PublisherId>
        <PublisherName>Mc.GrawHill</PublisherName>
        <PublisherIndex></PublisherIndex>
        <PublisherCategoryQuota></PublisherCategoryQuota>
    </BookPublisherInfo>
    <BookPurchaselist>
       <Customer>
           <FirstName>John</FirstName>
           <LastName>Smith</LastName>
           <MiddleName></MiddleName>
           <NickName></NickName>
       </Customer>
        <Customer>
           <FirstName>Winston</FirstName>
           <LastName>Churchill</LastName>
           <MiddleName></MiddleName>
           <NickName></NickName>
       </Customer>
    </BookPurchaselist>
</Book>
XML

I tried with below code but its somehow not working properly.

cust = doc.at_xpath("//Customer")
cust.each do |cust_obj|
    if cust_obj.has_text? == false
       cust_obj.delete
    end
end

This is somehow not working properly and giving the below output

<Book>
    <BookId>BK45647</BookId>
    <BookName>The Client by John Grisham</BookName>
    <BookAuthenticationCode></BookAuthenticationCode>
    <BookCategory>Suspense</BookCategory>
    <BookSequence></BookSequence>
    <BookPublisherInfo>
        <PublisherId>PBBK12345</PublisherId>
        <PublisherName>Mc.GrawHill</PublisherName>
        <PublisherIndex></PublisherIndex>
        <PublisherCategoryQuota></PublisherCategoryQuota>
    </BookPublisherInfo>
    <BookPurchaselist>
       <Customer>
           <FirstName>John</FirstName>
           <LastName>Smith</LastName>
           <MiddleName></MiddleName>
       </Customer>
        <Customer>
           <FirstName>Winston</FirstName>
           <LastName>Churchill</LastName>
           <NickName></NickName>
       </Customer>
    </BookPurchaselist>
</Book>

Few of the elements which has empty text are getting and few of them remain as such. How can i recursively delete elements at specific xpath(with empty data) and re-write the XML.

Got stuck here.. Need suggestions.

Upvotes: 1

Views: 1283

Answers (1)

Patrick Oscity
Patrick Oscity

Reputation: 54674

doc.xpath('//Customer/child::*[not(text())]').each do |node|
  node.remove
end

You can use not(node()) if you want to delete nodes that have no children, too.

EDIT: Full working example (using the same code as above)

require 'nokogiri'

xml = <<-XML
<Book>
    <BookId>BK45647</BookId>
    <BookName>The Client by John Grisham</BookName>
    <BookAuthenticationCode></BookAuthenticationCode>
    <BookCategory>Suspense</BookCategory>
    <BookSequence></BookSequence>
    <BookPublisherInfo>
        <PublisherId>PBBK12345</PublisherId>
        <PublisherName>Mc.GrawHill</PublisherName>
        <PublisherIndex></PublisherIndex>
        <PublisherCategoryQuota></PublisherCategoryQuota>
    </BookPublisherInfo>
    <BookPurchaselist>
       <Customer>
           <FirstName>John</FirstName>
           <LastName>Smith</LastName>
           <MiddleName></MiddleName>
       </Customer>
        <Customer>
           <FirstName>Winston</FirstName>
           <LastName>Churchill</LastName>
           <NickName></NickName>
       </Customer>
    </BookPurchaselist>
</Book>
XML

doc = Nokogiri.parse(xml)

doc.xpath('//Customer/child::*[not(text())]').each do |node|
  node.remove
end

puts doc.to_s

The output of this program is:

<?xml version="1.0"?>
<Book>
    <BookId>BK45647</BookId>
    <BookName>The Client by John Grisham</BookName>
    <BookAuthenticationCode/>
    <BookCategory>Suspense</BookCategory>
    <BookSequence/>
    <BookPublisherInfo>
        <PublisherId>PBBK12345</PublisherId>
        <PublisherName>Mc.GrawHill</PublisherName>
        <PublisherIndex/>
        <PublisherCategoryQuota/>
    </BookPublisherInfo>
    <BookPurchaselist>
       <Customer>
           <FirstName>John</FirstName>
           <LastName>Smith</LastName>

       </Customer>
        <Customer>
           <FirstName>Winston</FirstName>
           <LastName>Churchill</LastName>

       </Customer>
    </BookPurchaselist>
</Book>

Upvotes: 4

Related Questions