Eleni Kalaitzi
Eleni Kalaitzi

Reputation: 11

Load an ontology in R

I would like to find which values from a list exist in a specific ontology using bert map algorithm.

One of the first steps is to load the ontology in order to use it in the process.

As I am newbie in this I ask chatgpt to create an example and it took me this. However as I check the rdflib has not the Graph() command.

Here is the example:

install.packages(c("textTinyR", "rdflib"))
library(textTinyR)
library(rdflib)
uco_ontology <- "C:/Users/User/Desktop/uco_1_5.owl"  # Replace with the path to your UCO ontology file
graph <- rdflib::Graph()
rdflib::parse(graph, file = uco_ontology)

Is there any way to load an ontology to R? From here you can donwload the ontology (owl) file which requests the snippet.

Upvotes: 0

Views: 239

Answers (2)

Herv&#233; Pag&#232;s
Herv&#233; Pag&#232;s

Reputation: 56

Maybe try import_owl() from Bioconductor package simona.

Upvotes: 0

lampros
lampros

Reputation: 581

There are a few ways to parse the .owl file that you have, the vignette of the package might include more information

require(jsonld)
require(rdflib)
require(xml2)
require(jsonlite)
require(textTinyR)

# the expected format based on the example .rdf file
doc <- system.file("extdata/example.rdf", package="redland")
tmp_doc = readLines(doc, warn = F)
tmp_doc
# [1] "<?xml version=\"1.0\"?>"                                                  "  <rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\""    
# [3] "  xmlns:dc=\"http://purl.org/dc/elements/1.1/\">"                         "    <rdf:Description rdf:about=\"http://www.johnsmith.com/\">"           
# [5] "    <dc:title>John Smith's Home Page</dc:title>"                          "    <dc:creator>John Smith</dc:creator>"                                 
# [7] "    <dc:description>The generic home page of John Smith</dc:description>" "  </rdf:Description> "                                                   
# [9] "</rdf:RDF>"  

# get the raw Github version of your mentioned .owl weblink
url_pth = 'https://raw.githubusercontent.com/Ebiquity/Unified-Cybersecurity-Ontology/master/uco_1_5.owl'

# The following does not work as it returns 0 'triples'

# read with xml2 and convert to json 
dat_xml = xml2::read_xml(url_pth) |>
  xml2::as_list() |>
  jsonlite::toJSON()

# use format 'jsonld'
rdf = rdflib::rdf_parse(doc = dat_xml, format = "jsonld")
rdf
# Total of 0 triples, stored in hashes
# -------------------------------

# The following approach returns parsed data

# download the .owl to a temporary file
tmp_xml <- tempfile(fileext = '.xml')
download.file(url = url_pth, destfile = tmp_xml)

# then each line in the file is a separate item in the vector, trims item both sides and concatenate using an empty space (new line works too)
dat_xml = textTinyR::read_rows(input_file = tmp_xml) |>
  sapply(function(x) {
    x = trimws(x, which = 'both')
    x
  }) |>
  paste(collapse = ' ')

# it returns output 'triples', I'm not sure if the total number is correct you must have to verify that, as there are many 'librdf errors'
rdf_out = rdflib::rdf_parse(doc = dat_xml, format = "guess")
# librdf error {��V - Using property attribute 'ontologyIRI' without a namespace is forbidden.
# librdf error {��V - Using property attribute 'versionIRI' without a namespace is forbidden.
# librdf error V - Using an attribute 'name' without a namespace is forbidden.
# ....

rdf_out
# Total of 57 triples, stored in hashes
# -------------------------------
# _:r1710140791r63392r12 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# _:r1710140791r63392r17 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# _:r1710140791r63392r25 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# _:r1710140791r63392r18 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# _:r1710140791r63392r32 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# _:r1710140791r63392r30 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# _:r1710140791r63392r29 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# _:r1710140791r63392r13 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# _:r1710140791r63392r24 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# _:r1710140791r63392r15 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
# 
# ... with 47 more triples

You might have a better luck with 'bert map' using the Pytorch package (https://krr-oxford.github.io/DeepOnto/). If R is required you can use the 'reticulate' R package to work with it from within R

Upvotes: 0

Related Questions