Reputation: 3502
I am trying to insert a Pandas dataframe into Clickhouse.
This is my code
import pandas
import sqlalchemy as sa
uri = 'clickhouse://default:@localhost/default'
ch_db = sa.create_engine(uri)
pdf = pandas.DataFrame.from_records([
{'year': 1994, 'first_name': 'Vova'},
{'year': 1995, 'first_name': 'Anja'},
{'year': 1996, 'first_name': 'Vasja'},
{'year': 1997, 'first_name': 'Petja'},
])
pdf.to_sql('test_humans', ch_db, if_exists='append', index=False)
And this is the error that I am receiving. Does this have to do with some missing extra arguments regarding the ENGINE? How can I fix this?
Exception: Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 65 (line 7, col 2): FORMAT TabSeparatedWithNamesAndTypes. Expected one of: ENGINE, storage definition (version 19.15.2.2 (official build))
Logging
INFO:sqlalchemy.engine.base.Engine:EXISTS TABLE test_humans INFO:sqlalchemy.engine.base.Engine:{} INFO:sqlalchemy.engine.base.Engine: CREATE TABLE test_humans ( first_name TEXT, year BIGINT )
INFO:sqlalchemy.engine.base.Engine:{} INFO:sqlalchemy.engine.base.Engine:ROLLBACK
Upvotes: 6
Views: 36920
Reputation: 11242
You can do it without sqlalchemy
.
pip install clickhouse-driver
from clickhouse_driver import Client
client = Client('localhost')
df = pandas.DataFrame.from_records([
{'year': 1994, 'first_name': 'Vova'},
{'year': 1995, 'first_name': 'Anja'},
{'year': 1996, 'first_name': 'Vasja'},
{'year': 1997, 'first_name': 'Petja'},
])
client.insert_dataframe(
'INSERT INTO "your_table" (year, first_name) VALUES',
df,
settings=dict(use_numpy=True),
)
# or
client.execute("INSERT INTO your_table VALUES", df.to_dict('records'))
Upvotes: 5
Reputation: 15226
sqlalchemy-clickhouse cannot create table automatically (at least until version 0.1.5.post0 inclusively) and interprets any sql-query of table creation as SELECT-query that always tail by FORMAT TabSeparatedWithNamesAndTypes-clause.
To work around this issue need to create a table manually using the infi.clickhouse_orm-api (this module delivered with sqlalchemy-clickhouse):
import pandas as pd
from infi.clickhouse_orm.engines import Memory
from infi.clickhouse_orm.fields import UInt16Field, StringField
from infi.clickhouse_orm.models import Model
from sqlalchemy import create_engine
# define the ClickHouse table schema
class Test_Humans(Model):
year = UInt16Field()
first_name = StringField()
engine = Memory()
engine = create_engine('clickhouse://default:@localhost/test')
# create table manually
with engine.connect() as conn:
conn.connection.create_table(Test_Humans) # https://github.com/Infinidat/infi.clickhouse_orm/blob/master/src/infi/clickhouse_orm/database.py#L142
pdf = pd.DataFrame.from_records([
{'year': 1994, 'first_name': 'Vova'},
{'year': 1995, 'first_name': 'Anja'},
{'year': 1996, 'first_name': 'Vasja'},
{'year': 1997, 'first_name': 'Petja'},
# ! sqlalchemy-clickhouse ignores the last item so add fake one
{}
])
pdf.to_sql('test_humans', engine, if_exists='append', index=False)
Take into account that sqlalchemy-clickhouse ignores the last item so add fake one (see source code and related issue 10).
Let's check this table in DB:
SELECT *
FROM test.test_humans
┌─year─┬─first_name─┐
│ 1994 │ Vova │
│ 1995 │ Anja │
│ 1996 │ Vasja │
│ 1997 │ Petja │
└──────┴────────────┘
4 rows in set. Elapsed: 0.003 sec.
*/
See the answer https://stackoverflow.com/a/68843676/303298 too.
Upvotes: 1
Reputation: 533
Starting with version 0.2.0 clickhouse_driver implements method insert_dataframe
. See: https://clickhouse-driver.readthedocs.io/en/latest/api.html#clickhouse_driver.Client.insert_dataframe
Upvotes: 6
Reputation: 31
You can also use https://github.com/kszucs/pandahouse to insert your dataframe without extra conversions.
pip install pandahouse
import pandahouse as ph
pdf = pandas.DataFrame.from_records([
{'year': 1994, 'first_name': 'Vova'},
{'year': 1995, 'first_name': 'Anja'},
{'year': 1996, 'first_name': 'Vasja'},
{'year': 1997, 'first_name': 'Petja'},
])
connection = dict(database='default',
host='localhost',
user='default',
password='')
ph.to_clickhouse(pdf, 'test_humans', index=False, chunksize=100000, connection=connection)
Upvotes: 3