Reputation: 8493
I have a CSV with 3 fields: name, latitude, longitude. A row of looks like this:
Place 1,73.992964,40.739037
What's the correct way to mongoimport latitude and longitude into a loc field? I understand that location indexed fields need to be longitude,latitude and are a single array instead of 2 discrete fields for latitude and longitude but I'm missing if there is a way to handle going from discreet values to the array via mongoimport
Do I need to first transform to a CSV with a single column loc with longitude and latitude in it instead?
Place1,[-73.992964,40.739037]
I'm frequently going to be dealing with CSVs that have latitude and longitude stored in independent columns so I'm hoping to find a way to do this with mongoimport.
Upvotes: 2
Views: 2129
Reputation: 4078
I experienced a similar problem, and I solved it by performing a short preprocessing pass using sed
to convert the CSV into a suitable JSON format (also using the new GeoJSON objects):
sed 's/\([^,]*\),\([0-9.-]*\),\([0-9.-]*\)/{ place: \1, location:{ type: "Point", coordinates: [ \3, \2 ] } }/' <data.csv >data.json
An explanation of what's going on:
sed // Execute the sed command
's/ // Use substitute mode
\([^,]*\) // Match a string containing anything except a ',' [1]
, // Match a single ',' (the separator)
\([0-9.-]*\) // Match any combination of numbers, '.' or '-' [2]
, // Match a single ',' (the separator)
\([0-9.-]*\) // Match any combination of numbers, '.' or '-' [3]
/{ place: \1, location:{ type: "Point", coordinates: [ \3, \2 ] } }/'
// Replace the match with the appropriate JSON format, inserting
// parts of the matched pattern ([1],[2],[3])
<data.csv // Perform the command on the contents of the data.csv file
>data.json // Output the results to a data.json file
I found that sed is pretty efficient, and even with a csv file containing ~8 million lines, it only took around a minute to perform this conversion.
It is then a simple task to import the newly created JSON file using mongoimport
, as shown in Marc's answer.
Upvotes: 1
Reputation: 5548
Mongoimport has very limited capabilities, and in situations such as this, the Official recommendation is to write a custom script that parses your csv file line by line and creates documents exactly the way that you would like them represented.
In order for a geospatial index to be created, the location information must be stored under the same key, as described in the "Some examples:" section at the top of the Geospatial Indexing documentation: http://www.mongodb.org/display/DOCS/Geospatial+Indexing
Importing the data directly from the .csv file creates documents like this:
doc1.csv:
place, lat, lon
Place 1,73.992964,40.739037
$ ./mongoimport -d test -c a --type csv --headerline --file doc1.csv
> db.a.find()
{ "_id" : ObjectId("4f7602d70c873ff911798fd3"), "place" : "Place 1", "lat" : 73.992964, "lon" : 40.739037 }
Unfortunately, it is not possible to create a geospatial index on the document above.
By way of experimentation, I attempted to import a .csv file with data in the second format that you described, without success.
doc2.csv:
place, loc
Place1,[-73.992964,40.739037]
$ ./mongoimport -d test -c b --type csv --headerline --file doc2.csv
> db.b.find()
{ "_id" : ObjectId("4f7602e40c873ff911798fd4"), "place" : "Place1", "loc" : "[-73.992964", "field2" : "40.739037]" }
As a further experiment, I changed the .csv document into json format, and imported that, and it seemed to work.
doc3.json:
{name:"Place1" , loc:[-73.992964,40.739037]}
$ ./mongoimport -d test -c c --type json --file doc3.json
> db.c.find()
{ "_id" : ObjectId("4f7604570c873ff911798fd5"), "name" : "Place1", "loc" : [ -73.992964, 40.739037 ] }
However, if you are writing a script to convert all of your .csv files into .json format, you are probably better off writing a custom script to import your .csv files directly into your collection instead.
Upvotes: 2