shripadsrd
shripadsrd

Reputation: 101

Csv to Json converter tool

I have a csv file which I want to convert to Avro; because there are no tools to convert directly from csv to avro, i decided to use an online tool to convert from csv to json and then I plan to write a small program to convert that json to convert to an avro model. The problem I have is that the csv file looks like this

H1, H2, H3
------------------
A1, A2, A3, A4, A4
B1, B2, B3, B4
C1, C2, C3

every column after H3 is part of the H3 array. The existing tools dont seem to have this feature included (this one says it does, but im not sure how to use it: http://www.convertcsv.com/csv-to-json.htm). Does anybody know how to deal with this.

Thanks, Shripadsrd

Upvotes: 1

Views: 699

Answers (3)

Danny Rodriguez
Danny Rodriguez

Reputation: 139

This can be accomplished in Powershell.

Import-Csv "foo.csv" | ConvertTo-Json | Add-Content -Path "foo.json"

You can add the -compress switch to remove newlines too

Import-Csv "foo.csv" | ConvertTo-Json -compress | Add-Content -Path "foo.json"

Upvotes: 1

jq170727
jq170727

Reputation: 14655

I am not familar with Avro but the following jq filter will perform the transformation Jona Rodrigues describes. Specifically: if the file filter.jq contains

[
  split("\n")                           # split string into lines
| (.[0]    | split(",")) as $headers    # split header
| (.[1:][] | split(","))                # split data rows
| select(length>0)                      # get rid of empty lines
| $headers[:-1] as $h1                  # fixed headers
| .[:($h1|length)] as $p1               # fixed part
| .[($h1|length):] as $p2               # variable part
| (
     [   [ $h1, $p1 ]                   # \  
       | transpose[]                    #  \ assemble fixed object
       | {key:.[0], value:.[1]}         #  / from fixed keys and values
     ] | from_entries                   # /
  ) + {
     ($headers[-1]): $p2                # assemble variable object
  }
]

and data contains

H1,H2,H3
A1,A2,A3,A4,A4
B1,B2,B3,B4
C1,C2,C3

then the command

jq -M -R -s -r -f filter.jq data    

will produce

[
  {
    "H1": "A1",
    "H2": "A2",
    "H3": [
      "A3",
      "A4",
      "A4"
    ]
  },
  {
    "H1": "B1",
    "H2": "B2",
    "H3": [
      "B3",
      "B4"
    ]
  },
  {
    "H1": "C1",
    "H2": "C2",
    "H3": [
      "C3"
    ]
  }
]

Upvotes: 0

Jona Rodrigues
Jona Rodrigues

Reputation: 980

What you are looking for is wrapping your JSON result based on some nesting parameters. Basically going from this :

H1, H2, H3
------------------
A1, A2, A3, A4, A4
B1, B2, B3, B4
C1, C2, C3 

To this:

[
    {
        "H1": "A1",
        "H2": "A2",
        "H3": [
            "A3",
            "A4",
            "A4"
        ]
    },
    {
        "H1": "B1",
        "H2": "B2",
        "H3": [
            "B3",
            "B4"
        ]
    },
    {
        "H1": "C1",
        "H2": "C2",
        "H3": [
            "C3"
        ]
    }
]

I created a tool to solve this type of problems at http://csvtojson.com

What you need to do, is adapt your header so that it explicitly states how you want to nest your result. For your example, it would look like :

H1, H2, H3.0, H3.1, H3.2
A1, A2, A3, A4, A4
B1, B2, B3, B4
C1, C2, C3 

Notice how H3 is repeated using dots H3.0, H3.1, H3.2: these dots represents how you want to wrap your results into an H3 array. Eventually, press the "Convert button" and your result will be prompted to you.

Hope it helps.

Upvotes: 0

Related Questions