Reputation: 754
I've set up a small EMR cluster with Hive/Presto installed, I want to query files on S3 and import them to Postgres on RDS.
To run queries on S3 and save the results in a table in postgres I've done the following:
Manually SSH into each of the 3 nodes and add a new catalog file:
/etc/presto/conf.dist/catalog/postgres.properties
with the following contents
connector.name=postgresql
connection-url=jdbc:postgresql://ip-to-postgres:5432/database
connection-user=<user>
connection-password=<pass>
and edited this file
/etc/presto/conf.dist/config.properties
adding
datasources=postgresql,hive
Restart presto by running the following manually on all 3 nodes
sudo restart presto-server
This setup seems to work well.
In my application, there are multiple databases created dynamically. It seems that those configuration/catalog changes need to be made for each database and the server needs to be restarted to see the new config changes.
Is there a proper way for my application (using boto or other methods) to update configurations by
Upvotes: 4
Views: 4937
Reputation: 1219
I believe you can run a simple bash script to achieve what you want. There is no other way except creating a new cluster with --configurations
parameter where you provide the desired configurations. You can run below script from the master node.
#!/bin/sh
# "cluster_nodes.txt" with private IP address of each node.
aws emr list-instances --cluster-id <cluster-id> --instance-states RUNNING | grep PrivateIpAddress | sed 's/"PrivateIpAddress"://' | sed 's/\"//g' | awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' > cluster_nodes.txt
# For each IP connect with ssh and configure.
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Connecting $line"
scp -i <PEM file> postgres.properties hadoop@$line:/tmp;
ssh -i <PEM file> hadoop@$line "sudo mv /tmp/postgres.properties /etc/presto/conf/catalog;sudo chown presto:presto /etc/presto/conf/catalog/postgres.properties;sudo chmod 644 /etc/presto/conf/catalog/postgres.properties; sudo restart presto-server";
done < cluster_nodes.txt
Upvotes: 2
Reputation: 4207
You can provide the configuration via the management console as follows:
Or you can use the awscli
to pass those configurations as follows:
#!/bin/bash
JSON=`cat <<JSON
[
{ "Classification": "presto-connector-postgresql",
"Properties": {
"connection-url": "jdbc:postgresql://ip-to-postgres:5432/database",
"connection-user": "<user>",
"connection-password": "<password>"
},
"Configurations": []
}
]
JSON`
aws emr create-cluster --configurations "$JSON" # ... reset of params
Upvotes: 0
Reputation: 1073
During Provision of your cluster: You can provide the configuration details at the time of provision.
Refer to Presto Connector Configuration on how to add this automatically during the provision of your cluster.
Upvotes: 0