RyanKilkelly
RyanKilkelly

Reputation: 279

Import CSV file and append to an array

I have a csv file that is formatted as follows:

Image Id,URL,Latitude,Longitude
17609472165,https://farm8.staticflickr.com/7780/17609472165_c44d9b5a0e_q.jpg,48.843226,2.31805
11375512374,https://farm6.staticflickr.com/5494/11375512374_66a4d9af6c_q.jpg,48.844166,2.376
24768920940,https://farm2.staticflickr.com/1571/24768920940_634cc06f43_q.jpg,48.844619,2.395897
9411072065,https://farm8.staticflickr.com/7368/9411072065_5e2083a32e_q.jpg,48.844666,2.3725
9996916356,https://farm3.staticflickr.com/2807/9996916356_640c493020_q.jpg,48.844666,2.3725
24281266199,https://farm2.staticflickr.com/1623/24281266199_bf63e25c23_q.jpg,48.844925,2.389616

I want to import this file and for each line in the file append a point lat and lon to a 2D array. I have tried code such as the following and it is not working(or printing anything) and is giving error "ValueError: all the input array dimensions except for the concatenation axis must match exactly"

import numpy

data  = open('dataset_import_noaddress', 'r')
A = []

for line in data:
    fields = line.strip().split(',')
    lat = fields[2]
    lon = fields[3]
    print lat
    print lon
    newrow = [lat, lon]
    A = numpy.vstack([A, newrow])

Can anyone suggest why this isn't working or even better suggest a better way to achieve the same thing. Thanks!

Upvotes: 1

Views: 5515

Answers (3)

Kordi
Kordi

Reputation: 2465

You want just read your csv into a Matrix with each row latitude, longitude. So basically read, it an delete the first 2 columns

Code

import numpy as np

input = np.genfromtxt(open("dataset.csv","rb"),delimiter=",", skip_header=1)
A = np.delete(input, [0,1], 1)

print(A)

It just reads the csv all non float values are converted to nan. Then just remove the first 2 columns with np.delete

Output

[[ 48.843226   2.31805 ]
 [ 48.844166   2.376   ]
 [ 48.844619   2.395897]
 [ 48.844666   2.3725  ]
 [ 48.844666   2.3725  ]
 [ 48.844925   2.389616]]

Upvotes: 1

Alexander
Alexander

Reputation: 109736

First, you generally want to use a with open(filename, 'r') as ...: format. One reason for this is that the file will be automatically closed should you encounter an error.

One often uses csv.reader for reading csv files in Python (although you can also read the table using pd.read_csv(...) if you are using Pandas). You then need to iterate over the reader using for line in reader:.

You are getting single variables and creating intermediate lists, using numpy.vstack for each row. It would be more efficient to aggregate everything as a list and then call vstack on the whole list.

A.append(line[2:4]) takes the third and fourth items from the list on the given row (e.g. [48.843226, 2.31805]) and appends it to the larger list A. You should first ensure the line has at least four values before appending, keeping track of the bad lines.

Once A has been built, you then call vstack.

import csv

with open(filename, 'r') as f:
    A = []
    bad_lines = []
    reader = csv.reader(f)
    for line in reader:
        if len(line) == 4:
            A.append(line[2:4])
        else:
            bad_lines.append(line)
    A = np.vstack(A)

Upvotes: 3

bninopaul
bninopaul

Reputation: 2779

So, basically you want the lat and long data from the csv file, is that right? I would suggest you'll use pandas' read_csv(), this way there is no need to loop the file line by line. Pandas can handle all the columns all at once.

import pandas as pd

file_ = pd.read_csv("dataset_import_noaddress", sep = ',')
A = np.array(file_[["Latitude", "Longitude"]])
print A

array([[ 48.843226, 2.31805 ], [ 48.844166, 2.376 ], [ 48.844619, 2.395897], [ 48.844666, 2.3725 ], [ 48.844666, 2.3725 ], [ 48.844925, 2.389616]])

Upvotes: 1

Related Questions