Reputation: 559
I'm trying to take a local dataframe and create it as a parquet table in Drill using the PyODBC library. I understand that PyDrill has better features suited for this but I am able to create and read data from the Drill instance- I only struggle with taking local dataframes and turning it into a table in Drill. Below is where I currently am at: I'm trying to read the iris dataset from my local and create it as a parquet table in Drill.
Am also wondering if it'd be possible to use the PyArrow library and the write_table and write_to_dataset functions in it to get this done?
## import iris dataset as a sample and write it out in Drill
iris_df=pd.read_csv('iris.csv')
## Use existing ODBC connection to connect to Drill instance
import pyodbc
import pyarrow as pa
import pyarrow.parquet as pq
conn = pyodbc.connect("DSN=MaprInstance", uid='rookiejoe',pwd='password',autocommit=True)
cursor = conn.cursor()
## What I'm hoping to do. Create a mapr_iris table using iris_df
cursor.execute('CREATE TABLE dfs.root/temp/mapr_iris as SELECT * FROM iris_df')
## Using PyArrow
write_table(iris_df,'connection string?')
Any help, pointers is much appreciated. Thanks!
Upvotes: 2
Views: 433