XML to CSV ruby

Question

I have multiple XML sample files and I would like to convert them in CSV, but there will be multiple of different attributes/nodes for different XML files therefore I dont want to hard code the different attributes. I would like the output to show column header as the first row, then each of the node/record being display vertically like a traditional column and row spreadsheet. Here is one the sample XML:



  
    
      30000
      1900-01-01
      
        Chief Executive Officer
      
    
  
  
    
      30100
      1900-01-01
      
        Administrator Job Profile
      
    
  
  
    
      30200
      1900-01-01
      
        0
        Facilities & Grounds Maintenance Attendant
        0
        Maintenance Job Title
        Maintain cleanliness of the campus building throughout the day and fulfill special requests as needed.
        <p>Job Description<b> rich text!</b></p>
        <p><b><i><span class="emphasis-2"><u>Additional</u></span></i></b> Job Description<b> rich text!</b></p>
        0
        1    
      
    
  
  
    
      30300
      1900-01-01
      
        0
        Sample_Job_Title
        0
        Sample Job Profile Summary
        Sample Job Description
        Sample Additional Job Description
        1

The code I used but come out incorrectly:

require 'csv'
require 'nokogiri'

file = File.read('jobProfile.xml')
doc = Nokogiri::XML(file)
a = []

CSV.open('xmloutput.csv', 'wb') do |csv|
  csv << doc.at('.').search('*').map(&:name)
  doc.search('.').each do |x|
    csv << x.search('*').map(&:text)
  end
end

Each set of record with its column headers and data being set horizontally. But I would like to iterate the data and keep one row of column headers. I am not sure how to do that without hard coding each of the attributes :/ Please help as I am still new to programming, and I have tried to find solution for a week :(

screenshot showing the csv output

peter · Accepted Answer

You need to first build an array of hashes and extract the keys as headers and then put the values in the right column, all nodes flattened to columns, ignoring the root and record keys.

Something like this

require 'nokogiri'
require 'set'

file    = File.read('jobProfile.xml')
doc     = Nokogiri::XML(file)
record  = {}
keys    = Set.new
records = []
csv     = ""

doc.traverse do |node| 
  value = node.text.gsub(/
 +/, '')
  if node.name
    if node.name != "text" # skip these nodes
      if value.length > 0 # skip empty nodes
        key = node.name.gsub(/sd:/,'').to_sym
        # if a new and not empty record, add to our records collection
        if key == :Job_Profile_Data && !record.empty?
          records << record
          record = {}
        elsif key[/Job_Profile|^root$|^document$/]
          # neglect these keys
        else
          key = node.name.gsub(/sd:/,'').to_sym
          # in case our value is html instead of text
          record[key] = Nokogiri::HTML.parse(value).text
          # add to our key set only if not allready in the set
          keys << key
        end
      end
    end
  end
end

# build our csv
File.open('./xmloutput.csv', 'w') do |file|
  file.puts %Q{"#{keys.to_a.join('","')}"}
  records.each do |record|
    keys.each do |key|
      file.write %Q{"#{record[key]}",}
    end
    file.write "
"
  end
end

Which gives in our csv file the following

"Job_Code","Effective_Date","Job_Title","Inactive","Include_Job_Code_in_Name","Job_Description","Additional_Job_Description","Work_Shift_Required","Public_Job"
"30000","1900-01-01","Chief Executive Officer","","","","","","",
"30100","1900-01-01","Administrator Job Profile","","","","","","",
"30200","1900-01-01","Facilities & Grounds Maintenance Attendant","0","0","Job Description rich text!","Additional Job Description rich text!","0","1",
"30300","1900-01-01","Sample_Job_Title","0","0","Sample Job Description","Sample Additional Job Description","1","",

XML to CSV ruby

Answers (1)

Related Questions