Guido
Guido

Reputation: 37

arangoimp of graph from CSV file

I have a network scan in a TSV file that contains data in a form like the following sample

source IP      target IP       source port    target port
192.168.84.3   192.189.42.52   5868           1214
192.168.42.52  192.189.42.19   1214           5968
192.168.4.3    192.189.42.52   60680          22
....  
192.189.42.52  192.168.4.3     22             61969

Is there an easy way to import this using arangoimp into the (pre-created) edge collection networkdata?

Upvotes: 0

Views: 395

Answers (1)

dothebart
dothebart

Reputation: 6067

You could combine the TSV importer, if it wouldn't fail converting the IPs (fixed in ArangoDB 3.0), so you need a bit more conversion logic to get valid CSV. One will use the ede attribute conversion option to convert the first two columns to valid _from and _to attributes during the import.

You shouldn't specify column subjects with blanks in them, and it should really be tabs or a constant number of columns. We need to specify a _from and a _to field in the subject line.

In order to make it work, you would pipe the above through sed to get valid CSV and proper column names like this:

cat /tmp/test.tsv  | \
  sed -e "s;source IP;_from;g;" \
      -e "s;target IP;_to;" \
      -e "s; port;Port;g" \
      -e 's;  *;",";g' \
      -e 's;^;";' \
      -e 's;$;";' | \
   arangoimp --file - \
      --type csv \
      --from-collection-prefix sourceHosts \
      --to-collection-prefix targetHosts \
      --collection "ipEdges" \
      --create-collection true \
      --create-collection-type edge

Sed with these regular expressions will create an intermediate representation looking like that:

"_from","_to","sourcePort","targetPort"
"192.168.84.3","192.189.42.52","5868","1214"

The generated edges will look like that:

{ 
  "_key" : "21056", 
  "_id" : "ipEdges/21056", 
  "_from" : "sourceHosts/192.168.84.3", 
  "_to" : "targetHosts/192.189.42.52", 
  "_rev" : "21056", 
  "sourcePort" : "5868", 
  "targetPort" : "1214" 
} 

Upvotes: 2

Related Questions