Reputation: 101
I have a csv file which I want to convert to Avro; because there are no tools to convert directly from csv to avro, i decided to use an online tool to convert from csv to json and then I plan to write a small program to convert that json to convert to an avro model. The problem I have is that the csv file looks like this
H1, H2, H3
------------------
A1, A2, A3, A4, A4
B1, B2, B3, B4
C1, C2, C3
every column after H3 is part of the H3 array. The existing tools dont seem to have this feature included (this one says it does, but im not sure how to use it: http://www.convertcsv.com/csv-to-json.htm). Does anybody know how to deal with this.
Thanks, Shripadsrd
Upvotes: 1
Views: 699
Reputation: 139
This can be accomplished in Powershell.
Import-Csv "foo.csv" | ConvertTo-Json | Add-Content -Path "foo.json"
You can add the -compress switch to remove newlines too
Import-Csv "foo.csv" | ConvertTo-Json -compress | Add-Content -Path "foo.json"
Upvotes: 1
Reputation: 14655
I am not familar with Avro but the following jq filter will perform the transformation
Jona Rodrigues describes. Specifically: if the file filter.jq
contains
[
split("\n") # split string into lines
| (.[0] | split(",")) as $headers # split header
| (.[1:][] | split(",")) # split data rows
| select(length>0) # get rid of empty lines
| $headers[:-1] as $h1 # fixed headers
| .[:($h1|length)] as $p1 # fixed part
| .[($h1|length):] as $p2 # variable part
| (
[ [ $h1, $p1 ] # \
| transpose[] # \ assemble fixed object
| {key:.[0], value:.[1]} # / from fixed keys and values
] | from_entries # /
) + {
($headers[-1]): $p2 # assemble variable object
}
]
and data
contains
H1,H2,H3
A1,A2,A3,A4,A4
B1,B2,B3,B4
C1,C2,C3
then the command
jq -M -R -s -r -f filter.jq data
will produce
[
{
"H1": "A1",
"H2": "A2",
"H3": [
"A3",
"A4",
"A4"
]
},
{
"H1": "B1",
"H2": "B2",
"H3": [
"B3",
"B4"
]
},
{
"H1": "C1",
"H2": "C2",
"H3": [
"C3"
]
}
]
Upvotes: 0
Reputation: 980
What you are looking for is wrapping your JSON result based on some nesting parameters. Basically going from this :
H1, H2, H3
------------------
A1, A2, A3, A4, A4
B1, B2, B3, B4
C1, C2, C3
To this:
[
{
"H1": "A1",
"H2": "A2",
"H3": [
"A3",
"A4",
"A4"
]
},
{
"H1": "B1",
"H2": "B2",
"H3": [
"B3",
"B4"
]
},
{
"H1": "C1",
"H2": "C2",
"H3": [
"C3"
]
}
]
I created a tool to solve this type of problems at http://csvtojson.com
What you need to do, is adapt your header so that it explicitly states how you want to nest your result. For your example, it would look like :
H1, H2, H3.0, H3.1, H3.2
A1, A2, A3, A4, A4
B1, B2, B3, B4
C1, C2, C3
Notice how H3
is repeated using dots H3.0
, H3.1
, H3.2
: these dots represents how you want to wrap your results into an H3
array. Eventually, press the "Convert button" and your result will be prompted to you.
Hope it helps.
Upvotes: 0