Madhav Gumma
Madhav Gumma

Reputation: 65

convert an undirected graph in CSV format to RDF format using SPARQL

I have a CSV format file that stores a Graph. It contains two columns source vertex id and destination vertex id. Ids are integers in a range [0,max_vertex-1]. I want to convert it into an RDF format file. I know that we can do it using a SPARQL query utilizing the "CONSTRUCT" phrase. But not completely sure how to write my query since I don't have subject, predicate, and object here. If all the edges are of equal weight or no weight, just a simple undirected graph, does SPARQL help in writing a query to convert the CSV file to RDF format.

If such a query exists, can someone help me with the SPARQL query? I don't know much of SPARQL.

Upvotes: 0

Views: 1030

Answers (1)

Thomas
Thomas

Reputation: 1030

It sounds like what you're trying to develop is the intermediate layer between your data (the csv file) and a graph (which you can then run SPARQL queries on). The intermediate layer that you're after is often called triplification, which is the process of turning raw data into RDF triples.

One common way to do this is with Python's rdflib. As a sketch, you should load your csv into python and loop over each row, constructing the appropriate triple at each iteration and add it to the graph.

An immediate problem that you mention is you don't have any predicates-which is an absolute requirement for the RDF data model (it's how you connect nodes). I would suggest finding an ontology with an appropriate term and use it to connect the nodes, or make up your own term as I've done below if you're exploring the data.

Some pseudocode...

import rdflib

# Create the graph object which holds the triples
graph = rdflib.Graph()

For each row in csv file:
   s = rdflib.URIRef(f'#/{row["column_1]}')
   p = rdflib.URIRef("#connectsTo")
   o = rdflib.URIRef(f'#/{row["column_2"]}')
   graph.add((s, p, o))

g.serialize(destination='graph.ttl', format='turtle')

From here, you can load graph.ttl into a graph store that supports RDF or run a separate reasoner over it.

To avoid any confusion with SPARQL-it's used to query existing graphs; the CONSTRUCT query takes an existing graph and returns a new graph.

Upvotes: 2

Related Questions