Reputation: 279
I have a csv file that is formatted as follows:
Image Id,URL,Latitude,Longitude
17609472165,https://farm8.staticflickr.com/7780/17609472165_c44d9b5a0e_q.jpg,48.843226,2.31805
11375512374,https://farm6.staticflickr.com/5494/11375512374_66a4d9af6c_q.jpg,48.844166,2.376
24768920940,https://farm2.staticflickr.com/1571/24768920940_634cc06f43_q.jpg,48.844619,2.395897
9411072065,https://farm8.staticflickr.com/7368/9411072065_5e2083a32e_q.jpg,48.844666,2.3725
9996916356,https://farm3.staticflickr.com/2807/9996916356_640c493020_q.jpg,48.844666,2.3725
24281266199,https://farm2.staticflickr.com/1623/24281266199_bf63e25c23_q.jpg,48.844925,2.389616
I want to import this file and for each line in the file append a point lat and lon to a 2D array. I have tried code such as the following and it is not working(or printing anything) and is giving error "ValueError: all the input array dimensions except for the concatenation axis must match exactly"
import numpy
data = open('dataset_import_noaddress', 'r')
A = []
for line in data:
fields = line.strip().split(',')
lat = fields[2]
lon = fields[3]
print lat
print lon
newrow = [lat, lon]
A = numpy.vstack([A, newrow])
Can anyone suggest why this isn't working or even better suggest a better way to achieve the same thing. Thanks!
Upvotes: 1
Views: 5515
Reputation: 2465
You want just read your csv into a Matrix with each row latitude, longitude. So basically read, it an delete the first 2 columns
Code
import numpy as np
input = np.genfromtxt(open("dataset.csv","rb"),delimiter=",", skip_header=1)
A = np.delete(input, [0,1], 1)
print(A)
It just reads the csv all non float values are converted to nan. Then just remove the first 2 columns with np.delete
Output
[[ 48.843226 2.31805 ]
[ 48.844166 2.376 ]
[ 48.844619 2.395897]
[ 48.844666 2.3725 ]
[ 48.844666 2.3725 ]
[ 48.844925 2.389616]]
Upvotes: 1
Reputation: 109736
First, you generally want to use a with open(filename, 'r') as ...:
format. One reason for this is that the file will be automatically closed should you encounter an error.
One often uses csv.reader for reading csv files in Python (although you can also read the table using pd.read_csv(...) if you are using Pandas). You then need to iterate over the reader using for line in reader:
.
You are getting single variables and creating intermediate lists, using numpy.vstack for each row. It would be more efficient to aggregate everything as a list and then call vstack on the whole list.
A.append(line[2:4])
takes the third and fourth items from the list on the given row (e.g. [48.843226, 2.31805]) and appends it to the larger list A. You should first ensure the line has at least four values before appending, keeping track of the bad lines.
Once A has been built, you then call vstack.
import csv
with open(filename, 'r') as f:
A = []
bad_lines = []
reader = csv.reader(f)
for line in reader:
if len(line) == 4:
A.append(line[2:4])
else:
bad_lines.append(line)
A = np.vstack(A)
Upvotes: 3
Reputation: 2779
So, basically you want the lat and long data from the csv file, is that right? I would suggest you'll use pandas'
read_csv()
, this way there is no need to loop the file line by line. Pandas can handle all the columns all at once.
import pandas as pd
file_ = pd.read_csv("dataset_import_noaddress", sep = ',')
A = np.array(file_[["Latitude", "Longitude"]])
print A
array([[ 48.843226, 2.31805 ],
[ 48.844166, 2.376 ],
[ 48.844619, 2.395897],
[ 48.844666, 2.3725 ],
[ 48.844666, 2.3725 ],
[ 48.844925, 2.389616]])
Upvotes: 1