Reputation: 135
I'm working with a Python Worksheet and trying to download CSV data from an external URL using the requests library. However, I'm encountering a NameResolutionError that seems to indicate a DNS resolution issue. Here's the error message I'm receiving:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.data.gouv.fr', port=443): Max retries exceeded with url: /fr/datasets/r/5cb21a85-b0b0-4a65-a249-806a040ec372 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7fb7f66e1450>: Failed to resolve 'www.data.gouv.fr' ([Errno -3] Temporary failure in name resolution)"))
The error occurs when I try to execute the following code snippet in a Python Worksheet:
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col
import pandas as pd
from io import StringIO
import requests
def load_csv_to_snowflake(session, url, table_name, delimiter, encoding):
# Télécharger le contenu du fichier CSV depuis l'URL
response = requests.get(url)
if response.status_code != 200:
raise Exception(f"Échec de la requête HTTP: {response.status_code}")
# Lire le contenu dans un DataFrame pandas
csv_string = response.content.decode(encoding)
df = pd.read_csv(StringIO(csv_string), delimiter=delimiter)
# Charger le DataFrame dans Snowflake en utilisant Snowpark
session.write_pandas(df, table_name, auto_create_table=True)
def main(session: snowpark.Session):
# Liste des fichiers à charger
files_to_load = [
{
"url": "https://static.data.gouv.fr/resources/lieux-de-vaccination-contre-la-covid-19/20240328-180518/centres-vaccination.json",
"table_name": "table_file1",
"delimiter": ";",
"encoding": "utf-8"
},
# Ajouter les autres fichiers ici avec leurs paramètres respectifs
]
# Charger chaque fichier
for file_info in files_to_load:
load_csv_to_snowflake(
session,
file_info["url"],
file_info["table_name"],
file_info["delimiter"],
file_info["encoding"]
)
# Afficher un message de succès
print("Chargement des fichiers terminé avec succès.")
I suspect that the Python UDF environment in Snowflake might not have access to the internet or there are some network restrictions in place. Here are my questions:
Has anyone encountered a similar issue with Snowflake's Python UDFs and knows how to resolve it? Is there a way to configure network settings or DNS within the Snowflake Python UDF environment to allow external internet access? Are there any best practices for downloading external data into Snowflake using Python UDFs? Any help or guidance would be greatly appreciated!
Upvotes: 2
Views: 988
Reputation: 1951
As Lukasz noted in his comment, you will need to configure External Network Access. By default, Python in Snowflake is sandboxed to the internal Snowflake environment. In order to access external services, APIs, etc., you "open the firewall" by creating network rules, security integrations, and external access integrations to designate approved external access.
Upvotes: 1