Reputation: 4870
Is there any library for convert CSV to Avro file in Java or scala.
I tried to google it, But not able to find any library for it.
Upvotes: 2
Views: 11042
Reputation: 131
You can use spark for this purpose or spark-shell (with options : --packages org.apache.spark:spark-avro...) if it's an ad-hoc usage.
A sample code example :
val df = spark.read.csv("example.csv")
df.write.format("com.databricks.spark.avro").save("example.avro")
Upvotes: 1
Reputation: 11244
You could try this way (Spark 1.6).
people.csv
Michael, 29
Andy, 30
Justin, 19
Pyspark
file = sc.textFile("people.csv")
df = file.map(lambda line: line.split(',')).toDF(['name','age'])
>>> df.show()
+-------+---+
| name|age|
+-------+---+
|Michael| 29|
| Andy| 30|
| Justin| 19|
+-------+---+
df.write.format("com.databricks.spark.avro").save("peopleavro")
Peopleavro
{u'age': u' 29', u'name': u'Michael'}
{u'age': u' 30', u'name': u'Andy'}
{u'age': u' 19', u'name': u'Justin'}
Should you need to maintain data types, then create a schema and pass it.
schema = StructType([StructField("name",StringType(),True),StructField("age",IntegerType(),True)])
df = file.map(lambda line: line.split(',')).toDF(schema)
>>> df.printSchema()
root
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
Now your avro has
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name" : "name",
"type" : [ "string", "null" ]
}, {
"name" : "age",
"type" : [ "int", "null" ]
} ]
}
Upvotes: 2
Reputation: 606
You can do it easily by :
Upvotes: 0
Reputation: 7845
By googling I found this article: https://dzone.com/articles/convert-csv-data-avro-data
quoting:
To convert csv data to Avro data using Hive we need to follow the steps below:
Example: using a csv (student_id, subject_id, grade)
--1. Create a Hive table stored as textfile
USE test;
CREATE TABLE csv_table (
student_id INT,
subject_id INT,
marks INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
--2. Load csv_table with student.csv data
LOAD DATA LOCAL INPATH "/path/to/student.csv" OVERWRITE INTO TABLE test.csv_table;
--3. Create another Hive table using AvroSerDe
CREATE TABLE avro_table
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "com.rishav.avro",
"name": "student_marks",
"type": "record",
"fields": [ { "name":"student_id","type":"int"}, { "name":"subject_id","type":"int"}, { "name":"marks","type":"int"}]
}');
--4. Load avro_table with data from csv_table
INSERT OVERWRITE TABLE avro_table SELECT student_id, subject_id, marks FROM csv_table;
Upvotes: 2