Reputation: 129
I need to convert a csv spec file to YAML file for project needs. I wrote a small piece of python code for that but its not working as expected. I cannot use any online converter because the client I'm working for wont accept that. Here is the python code I have:
import csv
csvfile = open('custInfo.csv', 'r')
datareader = csv.reader(csvfile, delimiter=',', quotechar='"')
data_headings = []
yaml_pretext = "sourceTopic : 'BIG_PARTY'"
yaml_pretext += "\n"+'validationRequired : true'+"\n"
yaml_pretext += "\n"+'columnMappingEntityList :'+"\n"
for row_index, row in enumerate(datareader):
if row_index == 0:
data_headings = row
else:
# new_yaml = open('outfile.yaml', 'w')
yaml_text = ""
for cell_index, cell in enumerate(row):
lineSeperator = " "
cell_heading = data_headings[cell_index].lower().replace(" ", "_").replace("-", "")
if (cell_heading == "source"):
lineSeperator = ' - '
cell_text = lineSeperator+cell_heading + " : " + cell.replace("\n", ", ") + "\n"
yaml_text += cell_text
print yaml_text
csvfile.close()
The csv file has 4 columns and here it is:
source destination type childFields
fra:AppData app_data array application_id,institute_nm
fra:ApplicationId application_id string null
fra:InstituteName institute_nm string null
fra:CustomerData customer_data array name,customer_address,telephone_number
fra:Name name string null
fra:CustomerAddress customer_address array street,pincode
fra:Street street string null
fra:Pincode pincode string null
fra:TelephoneNumber telephone_number string null
Here is the yaml file I'm getting as output
- source : fra:AppData
destination : app_data
type : array
childfields : application_id,institute_nm
- source : fra:ApplicationId
destination : application_id
type : string
childfields : null
- source : fra:InstituteName
destination : institute_nm
type : string
childfields : null
- source : fra:CustomerData
destination : customer_data
type : array
childfields : name,customer_address,telephone_number
- source : fra:Name
destination : name
type : string
childfields : null
- source : fra:CustomerAddress
destination : customer_address
type : array
childfields : street,pincode
- source : fra:Street
destination : street
type : string
childfields : null
- source : fra:Pincode
destination : pincode
type : string
childfields : null
- source : fra:TelephoneNumber
destination : telephone_number
type : string
childfields : null
When the type is array, I need the output as childField, instead in new line. So the desired output will be:
- source : fra:AppData
destination : app_data
type : array
childfields : application_id,institute_nm
- source : fra:ApplicationId
destination : application_id
type : string
childfields : null
- source : fra:InstituteName
destination : institute_nm
type : string
childfields : null
- source : fra:CustomerData
destination : customer_data
type : array
childfields : name,customer_address,telephone_number
- source : fra:Name
destination : name
type : string
childfields : null
- source : fra:CustomerAddress
destination : customer_address
type : array
childfields : street,pincode
- source : fra:Street
destination : street
type : string
childfields : null
- source : fra:Pincode
destination : pincode
type : string
childfields : null
- source : fra:TelephoneNumber
destination : telephone_number
type : string
childfields : null
How can I get this?
Upvotes: 2
Views: 14218
Reputation: 39708
You currently are not using any YAML library to generate the output. This is bad practice since you do not check whether the string content you output contains YAML special characters which would require it to be quoted.
Next up, this is not valid YAML:
childfields : application_id,institute_nm
- source : fra:ApplicationId
destination : application_id
type : string
childfields : null
childfields
cannot have both a scalar value (application_id,institute_nm
) and a sequence value (starting with the item - source : fra:ApplicationId
).
Try generating your structure with lists and dicts and then dump that structure:
import yaml,csv
csvfile = open('custInfo.csv', 'r')
datareader = csv.reader(csvfile, delimiter=",", quotechar='"')
result = list()
type_index = -1
child_fields_index = -1
for row_index, row in enumerate(datareader):
if row_index == 0:
# let's do this once here
data_headings = list()
for heading_index, heading in enumerate(row):
fixed_heading = heading.lower().replace(" ", "_").replace("-", "")
data_headings.append(fixed_heading)
if fixed_heading == "type":
type_index = heading_index
elif fixed_heading == "childfields":
child_fields_index = heading_index
else:
content = dict()
is_array = False
for cell_index, cell in enumerate(row):
if cell_index == child_fields_index and is_array:
content[data_headings[cell_index]] = [{
"source" : "fra:" + value.capitalize(),
"destination" : value,
"type" : "string",
"childfields" : "null"
} for value in cell.split(",")]
else:
content[data_headings[cell_index]] = cell
is_array = (cell_index == type_index) and (cell == "array")
result.append(content)
print yaml.dump(result)
Upvotes: 7