Reputation: 37
I have a network scan in a TSV file that contains data in a form like the following sample
source IP target IP source port target port
192.168.84.3 192.189.42.52 5868 1214
192.168.42.52 192.189.42.19 1214 5968
192.168.4.3 192.189.42.52 60680 22
....
192.189.42.52 192.168.4.3 22 61969
Is there an easy way to import this using arangoimp into the (pre-created) edge collection networkdata?
Upvotes: 0
Views: 395
Reputation: 6067
You could combine the TSV importer, if it wouldn't fail converting the IPs (fixed in ArangoDB 3.0), so you need a bit more conversion logic to get valid CSV. One will use the ede attribute conversion option to convert the first two columns to valid _from
and _to
attributes during the import.
You shouldn't specify column subjects with blanks in them, and it should really be tabs or a constant number of columns. We need to specify a _from
and a _to
field in the subject line.
In order to make it work, you would pipe the above through sed
to get valid CSV and proper column names like this:
cat /tmp/test.tsv | \
sed -e "s;source IP;_from;g;" \
-e "s;target IP;_to;" \
-e "s; port;Port;g" \
-e 's; *;",";g' \
-e 's;^;";' \
-e 's;$;";' | \
arangoimp --file - \
--type csv \
--from-collection-prefix sourceHosts \
--to-collection-prefix targetHosts \
--collection "ipEdges" \
--create-collection true \
--create-collection-type edge
Sed with these regular expressions will create an intermediate representation looking like that:
"_from","_to","sourcePort","targetPort"
"192.168.84.3","192.189.42.52","5868","1214"
The generated edges will look like that:
{
"_key" : "21056",
"_id" : "ipEdges/21056",
"_from" : "sourceHosts/192.168.84.3",
"_to" : "targetHosts/192.189.42.52",
"_rev" : "21056",
"sourcePort" : "5868",
"targetPort" : "1214"
}
Upvotes: 2