Martinb
Martinb

Reputation: 1

Merge multiple XML files with Groovy

I'm pretty new to Groovy, and actually just found out about it yesterday. I'm building a site with the MkDocs static site generator and a new internal requirement results in me having to split the current site in to three different site containers to ensure unique search indexes and so on. This is all taken care of and built using Jenkins and is working great.

This solution unfortunately results in three different sitemaps that I need to merge and was suggested to look into Groovy. I've gotten most of below code from Groovy - merging XML nodes and I've not gotten any other result than the first sitemap written into my file. Any suggestions what can be going wrong here?

//Define XML objects to parse, set namespace to false.
def sm1 = new XmlSlurper( false, false ).parse(new File('C://test/site-1/sitemap.xml'))
def sm2 = new XmlSlurper( false, false ).parse(new File('C://test/site-2/sitemap.xml'))
def sm3 = new XmlSlurper( false, false ).parse(new File('C://test/site-3/sitemap.xml'))

//Define the output file.
def output = new File ('C://test/sitemap.xml')

//Append url-nodes from sitemap 2 to sitemap 1 urlset.
sm2.'**'.findAll{it.name() == 'url'}.collect{ sm1.urlset.appendNode(it)}
//Append url-nodes from sitemap 3 to sitemap 1 urlset.
sm3.'**'.findAll{it.name() == 'url'}.collect{ sm1.urlset.appendNode(it)}

//Define what to write to file.
def content = groovy.xml.XmlUtil.serialize(sm1)

//Write to file.
output.newWriter().withWriter { w ->
  w << content
}

/site-1/sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
     <loc>https://site/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
</urlset>

/site-2/sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
     <loc>https://site/site-2/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
    <url>
     <loc>https://site/site-2/section/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
</urlset>

/site-3/sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
     <loc>https://site/site-3/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
    <url>
     <loc>https://site/site-3/section/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
</urlset>

Expected output

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
     <loc>https://site/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
    <url>
     <loc>https://site/site-2/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
    <url>
     <loc>https://site/site-2/section/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
    <url>
     <loc>https://site/site-3/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
    <url>
     <loc>https://site/site-3/section/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
</urlset>

Current output

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
     <loc>https://site/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
</urlset>

Upvotes: 0

Views: 1827

Answers (1)

daggett
daggett

Reputation: 28564

1. after xml parsing variable already referencing the root element, so to access urlset tag you just need to use sm1... instead of sm1.urlset...

  1. collect could work, however better to use each in this case

below the working code:

def sm1 = new XmlSlurper( false, false ).parseText('''<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
     <loc>https://site/site-1/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
    <url>
     <loc>https://site/site-1/section/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
</urlset>''')

def sm2 = new XmlSlurper( false, false ).parseText('''<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
     <loc>https://site/site-2/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
    <url>
     <loc>https://site/site-2/section/</loc>
     <lastmod>2019-01-18</lastmod>
     <changefreq>daily</changefreq>
    </url>
</urlset>''')


//Append url-nodes from sitemap 2 to sitemap 1 urlset.
println sm2.url.each{println sm1.appendNode(it)}

//Define what to write to file.
def content = groovy.xml.XmlUtil.serialize(sm1)
println content

Upvotes: 1

Related Questions