user3444971
user3444971

Reputation: 129

CSV to Yaml conversion using Python script

I need to convert a csv spec file to YAML file for project needs. I wrote a small piece of python code for that but its not working as expected. I cannot use any online converter because the client I'm working for wont accept that. Here is the python code I have:

import csv
csvfile = open('custInfo.csv', 'r')

datareader = csv.reader(csvfile, delimiter=',', quotechar='"')
data_headings = []

yaml_pretext = "sourceTopic : 'BIG_PARTY'"
yaml_pretext += "\n"+'validationRequired : true'+"\n"
yaml_pretext += "\n"+'columnMappingEntityList :'+"\n"
for row_index, row in enumerate(datareader):
    if row_index == 0:
        data_headings = row
    else:
        # new_yaml = open('outfile.yaml', 'w')
        yaml_text = ""
        for cell_index, cell in enumerate(row):
            lineSeperator = "    "
            cell_heading = data_headings[cell_index].lower().replace(" ", "_").replace("-", "")
            if (cell_heading == "source"):
                lineSeperator = '  - '

            cell_text = lineSeperator+cell_heading + " : " + cell.replace("\n", ", ") + "\n"

            yaml_text += cell_text
        print yaml_text

csvfile.close()

The csv file has 4 columns and here it is:

source               destination        type     childFields
fra:AppData          app_data           array    application_id,institute_nm
fra:ApplicationId    application_id     string   null
fra:InstituteName    institute_nm       string   null
fra:CustomerData     customer_data      array    name,customer_address,telephone_number
fra:Name             name               string   null
fra:CustomerAddress  customer_address   array    street,pincode
fra:Street           street             string   null
fra:Pincode          pincode            string   null
fra:TelephoneNumber  telephone_number   string   null

Here is the yaml file I'm getting as output

  - source : fra:AppData
    destination : app_data
    type : array
    childfields : application_id,institute_nm

  - source : fra:ApplicationId
    destination : application_id
    type : string
    childfields : null

  - source : fra:InstituteName
    destination : institute_nm
    type : string
    childfields : null

  - source : fra:CustomerData
    destination : customer_data
    type : array
    childfields : name,customer_address,telephone_number

  - source : fra:Name
    destination : name
    type : string
    childfields : null

  - source : fra:CustomerAddress
    destination : customer_address
    type : array
    childfields : street,pincode

  - source : fra:Street
    destination : street
    type : string
    childfields : null

  - source : fra:Pincode
    destination : pincode
    type : string
    childfields : null

  - source : fra:TelephoneNumber
    destination : telephone_number
    type : string
    childfields : null

When the type is array, I need the output as childField, instead in new line. So the desired output will be:

  - source : fra:AppData
    destination : app_data
    type : array
    childfields : application_id,institute_nm
      - source : fra:ApplicationId
        destination : application_id
        type : string
        childfields : null

      - source : fra:InstituteName
        destination : institute_nm
        type : string
        childfields : null

  - source : fra:CustomerData
    destination : customer_data
    type : array
    childfields : name,customer_address,telephone_number
      - source : fra:Name
        destination : name
        type : string
        childfields : null

      - source : fra:CustomerAddress
        destination : customer_address
        type : array
        childfields : street,pincode
           - source : fra:Street
           destination : street
           type : string
           childfields : null

           - source : fra:Pincode
           destination : pincode
           type : string
           childfields : null

      - source : fra:TelephoneNumber
        destination : telephone_number
        type : string
        childfields : null

How can I get this?

Upvotes: 2

Views: 14218

Answers (1)

flyx
flyx

Reputation: 39708

You currently are not using any YAML library to generate the output. This is bad practice since you do not check whether the string content you output contains YAML special characters which would require it to be quoted.

Next up, this is not valid YAML:

    childfields : application_id,institute_nm
      - source : fra:ApplicationId
        destination : application_id
        type : string
        childfields : null

childfields cannot have both a scalar value (application_id,institute_nm) and a sequence value (starting with the item - source : fra:ApplicationId).

Try generating your structure with lists and dicts and then dump that structure:

import yaml,csv

csvfile = open('custInfo.csv', 'r')
datareader = csv.reader(csvfile, delimiter=",", quotechar='"')
result = list()
type_index = -1
child_fields_index = -1

for row_index, row in enumerate(datareader):
  if row_index == 0:
    # let's do this once here
    data_headings = list()
    for heading_index, heading in enumerate(row):
      fixed_heading = heading.lower().replace(" ", "_").replace("-", "")
      data_headings.append(fixed_heading)
      if fixed_heading == "type":
        type_index = heading_index
      elif fixed_heading == "childfields":
        child_fields_index = heading_index
  else:
    content = dict()
    is_array = False
    for cell_index, cell in enumerate(row):
      if cell_index == child_fields_index and is_array:
        content[data_headings[cell_index]] = [{
            "source" : "fra:" + value.capitalize(),
            "destination" : value,
            "type" : "string",
            "childfields" : "null"
          } for value in cell.split(",")]
      else:
        content[data_headings[cell_index]] = cell
        is_array = (cell_index == type_index) and (cell == "array")
    result.append(content)
print yaml.dump(result)

Upvotes: 7

Related Questions