Reputation: 287
I have a text file that I am trying to convert to a parquet file and then load it into a hive table by write it to it's hdfs path. Everything runs but the table shows no values.
Here is my code:
#Create my table
spark.sql("create external table if not exists table1 ( c0 string, c1 string, c2 string) STORED AS parquet LOCATION 'hdfs://hadoop_data/hive/table1'")
hdfs="hdfs://hadoop_data/hive/table1/output.parquet"
#Read my data file
e=spark.read.text("/home/path/sample_txt_files/sample5.txt")
#Write it to hdfs table as a parquet file
e.write.parquet("hdfs")
Everything runs but when I check the contents of the table by select * from table1, no values are there:
The content in the sample5.txt file goes like this:
ID,Name,Age
1,James,15
Content inside the .parqeut file
Any ideas or suggestion as to why no data is showing in the table?
Upvotes: 1
Views: 13188
Reputation: 31460
Did u tried to set these parameters in hive shell as you are writing hdfs://hadoop_data/hive/table1/output.parquet
directory but table is created on hdfs://hadoop_data/hive/table1/
. As you are writing output.parquet
nested directory.
SET hive.mapred.supports.subdirectories=TRUE;
SET mapred.input.dir.recursive=TRUE;
And then check are u able to see data from hive table.
(or)
Try to insert data into table directly
using .insertInto
function.
e.write.format("parquet").insertInto("default.table1")
As you are reading text file even though if you have 3 columns spark reads as one column(value).
e=spark.read.text("/home/path/sample_txt_files/sample5.txt") //returns dataframe
f=e.withColumn("c0",split(col("value"),",")(0)).withColumn("c1",split(col("value"),",")(1)).withColumn("c2",split(col("value"),",")(2)).drop("value") //split the column and extract data
f.write.format("parquet").insertInto("default.table1")
In case if you have csv file (or) any other delimiter file
use spark.read.csv()
with options to read the file.
Upvotes: 2
Reputation: 725
I would check the underlying parquet data type comparing to your hive schema.
being said, the id, name, age are both string in hive table.
but when you write out the parquet, the data type of id and age might be int instead of string.
Upvotes: 0