Reputation: 51
I have multiple XML sample files and I would like to convert them in CSV, but there will be multiple of different attributes/nodes for different XML files therefore I dont want to hard code the different attributes. I would like the output to show column header as the first row, then each of the node/record being display vertically like a traditional column and row spreadsheet. Here is one the sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<sd:root xmlns:wd="urn:com.sample/bsvc" sd:version="v31.0">
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30000</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Job_Title>Chief Executive Officer</sd:Job_Title>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30100</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Job_Title>Administrator Job Profile</sd:Job_Title>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30200</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Inactive>0</sd:Inactive>
<sd:Job_Title>Facilities & Grounds Maintenance Attendant</sd:Job_Title>
<sd:Include_Job_Code_in_Name>0</sd:Include_Job_Code_in_Name>
<sd:Job_Profile_Private_Title>Maintenance Job Title</sd:Job_Profile_Private_Title>
<sd:Job_Profile_Summary>Maintain cleanliness of the campus building throughout the day and fulfill special requests as needed.</sd:Job_Profile_Summary>
<sd:Job_Description><p>Job Description<b> rich text!</b></p></sd:Job_Description>
<sd:Additional_Job_Description><p><b><i><span class="emphasis-2"><u>Additional</u></span></i></b> Job Description<b> rich text!</b></p></sd:Additional_Job_Description>
<sd:Work_Shift_Required>0</sd:Work_Shift_Required>
<sd:Public_Job>1</sd:Public_Job>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30300</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Inactive>0</sd:Inactive>
<sd:Job_Title>Sample_Job_Title</sd:Job_Title>
<sd:Include_Job_Code_in_Name>0</sd:Include_Job_Code_in_Name>
<sd:Job_Profile_Summary>Sample Job Profile Summary</sd:Job_Profile_Summary>
<sd:Job_Description>Sample Job Description</sd:Job_Description>
<sd:Additional_Job_Description>Sample Additional Job Description</sd:Additional_Job_Description>
<sd:Work_Shift_Required>1</sd:Work_Shift_Required>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
</sd:root>
The code I used but come out incorrectly:
require 'csv'
require 'nokogiri'
file = File.read('jobProfile.xml')
doc = Nokogiri::XML(file)
a = []
CSV.open('xmloutput.csv', 'wb') do |csv|
csv << doc.at('.').search('*').map(&:name)
doc.search('.').each do |x|
csv << x.search('*').map(&:text)
end
end
Each set of record with its column headers and data being set horizontally. But I would like to iterate the data and keep one row of column headers. I am not sure how to do that without hard coding each of the attributes :/ Please help as I am still new to programming, and I have tried to find solution for a week :(
screenshot showing the csv output
Upvotes: 1
Views: 280
Reputation: 42182
You need to first build an array of hashes and extract the keys as headers and then put the values in the right column, all nodes flattened to columns, ignoring the root and record keys.
Something like this
require 'nokogiri'
require 'set'
file = File.read('jobProfile.xml')
doc = Nokogiri::XML(file)
record = {}
keys = Set.new
records = []
csv = ""
doc.traverse do |node|
value = node.text.gsub(/\n +/, '')
if node.name
if node.name != "text" # skip these nodes
if value.length > 0 # skip empty nodes
key = node.name.gsub(/sd:/,'').to_sym
# if a new and not empty record, add to our records collection
if key == :Job_Profile_Data && !record.empty?
records << record
record = {}
elsif key[/Job_Profile|^root$|^document$/]
# neglect these keys
else
key = node.name.gsub(/sd:/,'').to_sym
# in case our value is html instead of text
record[key] = Nokogiri::HTML.parse(value).text
# add to our key set only if not allready in the set
keys << key
end
end
end
end
end
# build our csv
File.open('./xmloutput.csv', 'w') do |file|
file.puts %Q{"#{keys.to_a.join('","')}"}
records.each do |record|
keys.each do |key|
file.write %Q{"#{record[key]}",}
end
file.write "\n"
end
end
Which gives in our csv file the following
"Job_Code","Effective_Date","Job_Title","Inactive","Include_Job_Code_in_Name","Job_Description","Additional_Job_Description","Work_Shift_Required","Public_Job"
"30000","1900-01-01","Chief Executive Officer","","","","","","",
"30100","1900-01-01","Administrator Job Profile","","","","","","",
"30200","1900-01-01","Facilities & Grounds Maintenance Attendant","0","0","Job Description rich text!","Additional Job Description rich text!","0","1",
"30300","1900-01-01","Sample_Job_Title","0","0","Sample Job Description","Sample Additional Job Description","1","",
Upvotes: 1