Lost created files in AWS saveAsTable

Question

I am new to pyspark, to spark in general, and to AWS.

I tried saving a table using:

# Save distinct domains dataframe into SQL table
distinct_domains.write.saveAsTable('distinct_domains', mode='ignore', compression='lz4', header=True)

I thought I was saving a SQL table, but apparently this is a Hive table (which I just found out that exists).

I read on another post that it goes to the location s3://my_bucket_name/warehouse

And on yet another post that it goes to hdfs://user/hive/warehouse

I can't find this table anywhere. Please help.

data_addict · Accepted Answer

Probably you can give a try of below approach

1)

df_writer.partitionBy('col1')\
         .saveAsTable('test_table', format='parquet', mode='overwrite',
                      path='s3a://bucket/foo')

2) You can create one temporary table using

myDf.createOrReplaceTempView("tempTable")

Then using the sqlcontext you can create hive table for the tempTable

sqlContext.sql("create table table_name as select * from tempTable");

Lost created files in AWS saveAsTable

Answers (1)

Related Questions