Reputation: 978
I'm having trouble wrapping my brain around the best way to do this; I can't figure out the dictionary structure to parse a csv and create various node dictionaries of unique nodes with unique properties to then load in to Neo4j.
Here is what I have so far:
import blah blah
driver = GraphDatabase.driver("bolt://localhost")
input_location = "/path/to/file/file.csv"
output_location = "/path/to/dir/"
def main():
origin = {"PORT": [], "NUMBER": [], "CALL_SIGN": []}
ship = {"BOAT_NAME": [], "BOAT_NUMBER": []}
shipper = {"NAME": [], "STREET": [], "CITY": [], "ZIPCODE": []}
destination = {"COUNTRY": [], "CITY": []}
node_type_list = [origin, ship, shipper, destination]
with open(input_location, "rb") as ship_data:
reader = csv.DictReader(ship_data, delimiter='|')
print "parsing & uploading data\n"
for row in reader:
for node_type in node_type_list:
for key in node_type:
dict_load(node_type,key,row[key])
send_nodes_to_graph(node_type_list)
def dict_load(node_type,key,value):
try:
if value not in node_type[key]:
node_type[key].append(value)
except Exception as e:
print e
def send_nodes_to_graph(node_type_list):
session = driver.session()
session.run('''something_like_this:http://stackoverflow.com/questions/40332698/can-a-python-dictionary-be-passed-as-neo4j-literal-maps/40333843#40333843''')
session.close()
if __name__ == '__main__':
main()
my csv looks like this:
COUNTRY NUMBER CALL_SIGN PORT BOAT_NAME BOAT_NUMBER NAME STREET CITY ST ZIPCODE
D REP 91487 S DOMINGO BALTIMORE PESCADO 1276394 JH FWEICH 9874 LOMBARDO WAY PORT ELIZABETH NJ 8348
D REP 91487 S DOMINGO VA BEACH TROPIC 9872347 JH FWEICH 9874 LOMBARDO WAY PORT ELIZABETH NJ 8348
D REP 91487 S DOMINGO VA BEACH TROPIC 9872347 JH FWEICH 9874 LOMBARDO WAY PORT ELIZABETH NJ 8348
D REP 91487 S DOMINGO VA BEACH CAPRICORN 8761231 JH FWEICH 9874 LOMBARDO WAY PORT ELIZABETH NJ 8348
my dict structure currently produces this:
origin {'NUMBER': ['91487'], 'CALL_SIGN': ['S DOMINGO'], 'PORT': ['BALTIMORE', 'VA BEACH']}
but I think it needs to look more like this, in order to load only unique nodes in to Neo4j:
origin {'91487': {'CALL_SIGN': 'S DOMINGO', 'PORT': 'BALTIMORE'}}
origin {'91487': {'CALL_SIGN': 'S DOMINGO', 'PORT': 'VA BEACH'}}
ship {'1276394': {'BOAT_NAME': 'PESCADO'}}
shipper {'JH FWEICH': {'STREET': '9874 LOMBARDO WAY', 'CITY':'PORT ELIZABETH'}}
etc....
Upvotes: 1
Views: 130
Reputation: 978
for clarity, I wanted to show the changes I made that @PatrickHaugh led me to. I wanted to give him credit for the logic behind the answer so I accepted it and am adding the update here. His Suggestion is below and my actual version that makes the correct structure follows.
origin = {"PORT", "NUMBER", "CALL_SIGN"}
ship = {"BOAT_NAME", "BOAT_NUMBER"}
shipper = {"NAME", "STREET", "CITY", "ZIPCODE"}
destination = {"COUNTRY", "CITY"}
node_type_list = [origin, ship, shipper, destination]
with open(input_location, "rb") as ship_data:
reader = csv.DictReader(ship_data, delimiter='|')
for row in reader:
dict_list = [{row["NUMBER"]: {key: row[key] for key in sublist}}for sublist in node_type_list]
output:
[{'91487': {'CALL_SIGN': 'S DOMINGO', 'PORT': 'BALTIMORE'}}, {'91487': {'BOAT_NUMBER': '1276394', 'BOAT_NAME': 'PESCADO'}}, ...]
** correct output version:
origin = ["NUMBER", "PORT", "CALL_SIGN"]
# moved the keys around so that the first key is the identifier for the sublist
ship = ["BOAT_NUMBER", "BOAT_NAME"]
shipper = ["NAME", "STREET", "CITY", "ZIPCODE"]
destination = ["COUNTRY", "CITY"]
node_type_list = [origin, ship, shipper, destination]
with open(input_location, "rb") as ship_data:
reader = csv.DictReader(ship_data, delimiter='|')
for row in reader:
# uses first element in sublist to identify dict section, uses following elements to populate dict properties
dict_list = [{row[sublist[0]]: {key: row[key] for key in sublist[1:]}} for sublist in node_type_list]
output:
[{'91487': {'PORT': 'VA BEACH', 'CALL_SIGN': 'S DOMINGO'}}, {'8761231': {'BOAT_NAME': 'CAPRICORN'}}, {'JH FWEICH': {'CITY': 'PORT ELIZABETH'...}}]
Upvotes: 0
Reputation: 61032
Maybe try something like this?
origin = {"PORT", "NUMBER", "CALL_SIGN"}
ship = {"BOAT_NAME", "BOAT_NUMBER"}
shipper = {"NAME", "STREET", "CITY", "ZIPCODE"}
destination = {"COUNTRY", "CITY"}
node_type_list = [origin, ship, shipper, destination]
with open(input_location, "rb") as ship_data:
reader = csv.DictReader(ship_data, delimiter='|')
print "parsing & uploading data\n"
for row in reader:
dict_list = [{row["NUMBER"]: {key: row[key] for key in sublist}}for sublist in node_type_list]
This builds a list of dicts that map the NUMBER
to a portion of the input dict. I'm not familiar with neo4j, but hopefully this is more what you wanted. Output should look like [{'91487': {'CALL_SIGN': 'S DOMINGO', 'PORT': 'BALTIMORE'}}, {'91487': {'BOAT_NUMBER': '1276394', 'BOAT_NAME': 'PESCADO'}}, ...]
Upvotes: 1