Esben Eickhardt
Esben Eickhardt

Reputation: 3862

Read csv from url one line at the time in Python 3.X

I have to read an online csv-file into a postgres database, and in that context I have some problems reading the online csv-file properly.

If I just import the file it reads as bytes, so I have to decode it. During the decoding it, however, seems that the entire file is turned into one long string.

# Libraries
import csv
import urllib.request

# Function for importing csv from url
def csv_import(url):
    url_open = urllib.request.urlopen(url)
    csvfile = csv.reader(url_open.decode('utf-8'), delimiter=',') 
    return csvfile;

# Reading file
p_pladser = csv_import("http://wfs-kbhkort.kk.dk/k101/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=k101:p_pladser&outputFormat=csv&SRSNAME=EPSG:4326")

When I try to read the imported file line by line it only reads one character at the time.

for row in p_pladser:
    print(row)
    break

['F']

Can you help me identify where it goes wrong? I am using Python 3.6.

EDIT: Per request my solution in R

# Loading library
library(RPostgreSQL)

# Reading dataframe
p_pladser = read.csv("http://wfs-kbhkort.kk.dk/k101/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=k101:p_pladser&outputFormat=csv&SRSNAME=EPSG:4326", encoding = "UTF-8", stringsAsFactors = FALSE)

# Creating database connection
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "secretdatabase", host = "secrethost", user = "secretuser", password = "secretpassword")

# Uploading dataframe to postgres database
dbWriteTable(con, "p_pladser", p_pladser , append = TRUE, row.names = FALSE, encoding = "UTF-8")

I have to upload several tables for 10,000 to 100,000 rows, and it total in R it takes 1-2 seconds to upload them all.

Upvotes: 2

Views: 6015

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 149095

csv.reader expect as argument a file like object and not a string. You have 2 options here:

  • either you read the data into a string (as you currently do) and then use a io.StringIO to build a file like object around that string:

    def csv_import(url):
        url_open = urllib.request.urlopen(url)
        csvfile = csv.reader(io.StringIO(url_open.read().decode('utf-8')), delimiter=',') 
        return csvfile;
    
  • or you use a io.TextIOWrapper around the binary stream provided by urllib.request:

    def csv_import(url):
        url_open = urllib.request.urlopen(url)
        csvfile = csv.reader(io.TextIOWrapper(url_open, encoding = 'utf-8'), delimiter=',') 
        return csvfile;
    

Upvotes: 3

Vivek Harikrishnan
Vivek Harikrishnan

Reputation: 866

How about loading the CSV with pandas!

import pandas as pd
csv = pd.read_csv("http://wfs-kbhkort.kk.dk/k101/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=k101:p_pladser&outputFormat=csv&SRSNAME=EPSG:4326")
print csv.columns

OR if you have the CSV downloaded in your machine, then directly

csv = pd.read_csv("<path_to_csv>")

Ok! You may consider passing delimiter and quotechar arguments to csv.reader, because the CSV contains quotes as well! Something like this,

with open('p_pladser.csv') as f:
 rows = csv.reader(f, delimiter=',', quotechar='"')
 for row in rows:
   print(row)

Upvotes: 2

Related Questions