Reputation: 303
I would like to view data from Mariadb in Superset. I think getting the data from Mariadb --> Spark --> Superset might be the best solution because I will also use Spark with H2o Sparkling Water.
1. I have tried pip3 install mysqlclient but got this error:
ERROR: Command "/bin/python3 -u -c 'import setuptools, tokenize; ... failed with error code 1 in /tmp/pip-install-kslmastj/mysqlclient/
2. I tried Spark with 2 configuration files but I do not think the data is accessible in Spark or Spark SQL.
File 1 ../conf/spark-defaults.conf
spark.driver.extraClassPath = /usr/share/java/mysql-connector-java.jar
spark.executor.extraClassPath = /usr/share/java/mysql-connector-java.jar
File 2 ../conf/hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/DBNAME</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>USERNAME</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>PASSWORD</value>
<description>password to use against metastore database</description>
</property>
</configuration>
3. I read about the SQLAlchemy dialects and PyHive. I searched through the superset code and cannot determine where to add the external dialects.
4. I have tried a few configurations in the superset config file. I am wondering if the port should be the Spark port. SQLALCHEMY_DATABASE_URI = 'hive://localhost:4040/'
5. I attempted to import a csv file but got an error.
NOTE: I can see the Mariadb data in Spark if I enter this at the scala prompt but I don't think this is the proper solution.
import org.apache.spark.sql.SQLContext
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val dataframe_mysql = sqlcontext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/DATABASE_NAME").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "TABLE_NAME").option("user", "USER_NAME").option("password", "PASSWORD").load()
dataframe_mysql.show()
Upvotes: 3
Views: 2012
Reputation: 31
I have connected a MariaDB with superset as my main BD instead the default SQLite, probably this will help you to connect.
SQLAlchemist needs an extra python lib for this task, in my case I have used pymysql. Once installed (you can use pip) the connection will be with the prefix mysql+pymysql, so it will have this look:
'mysql+pymysql://user:password@host/dbname'
Upvotes: 2