A competitive benchmark of NitrosBase Universal DBMS versus a well-known Graph DBMS shows that NitrosBase is significantly faster.
Generally, it is tens or hundreds of times faster. In the worst case (query 3b) it is at least 8 times faster. In the best case (query 11) it is 300 000 times faster.
This document reports NitrosBase Universal DBMS ’s SP²Bench benchmark results.
SP²Bench comprises a data-generator for arbitrarily large documents, which builds upon a library model close to a real-world application scenario. The benchmark queries implement meaningful requests on top of this data, thereby testing typical SPARQL operator constellations and RDF access patterns.
For more information on SP²Bench see the project’s site:
http://dbis.informatik.uni-freiburg.de/index.php?project=SP2B
All experiments were conducted on computer with an Intel® Core™ i5-3570 CPU @ 3.40GHz and 16GB DDR3 1600 MHz physical memory.
BENCHMARKS DATASETS
We used SP²Bench generator to generate test RDF documents comprising 50k, 250k, 1M, 5M and 25M triples. Then we performed the whole test on each generated dataset.
BENCHMARKS QUERIES
The benchmark queries also varied in general characteristics like selectivity, query and output size, and different types of JOINs.
Logarithmic scale is used to make the result more demonstrative.
Shorter bars indicate better performance.
Return the year of publication of 'Journal 1 (1940)'.
This simple query returns exactly one result (for arbitrarily large documents).
SELECT ?yr WHERE { ?journal rdf:type bench:Journal . ?journal dc:title "Journal 1 (1940)"^^xsd:string . ?journal dcterms:issued ?yr }
Extract all inproceedings with properties 'dc:creator', 'bench:booktitle', 'dc:title', 'swrc:pages', 'dcterms:partOf', 'rdfs:seeAlso', 'foaf:homepage', 'dcterms:issued', and optionally 'bench:abstract', including these properties.
This query implements a bushy graph pattern. It contains a single, simple OPTIONAL expression, and accesses large strings (i.e. the abstracts). Result size grows with database size, and a final result ordering is necessary due to operator ORDER BY.
SELECT ?inproc ?author ?booktitle ?title ?proc ?ee ?page ?url ?yr ?abstract WHERE { ?inproc rdf:type bench:Inproceedings . ?inproc dc:creator ?author . ?inproc bench:booktitle ?booktitle . ?inproc dc:title ?title . ?inproc dcterms:partOf ?proc . ?inproc rdfs:seeAlso ?ee . ?inproc swrc:pages ?page . ?inproc foaf:homepage ?url . ?inproc dcterms:issued ?yr OPTIONAL { ?inproc bench:abstract ?abstract } } ORDER BY ?yr
Select all articles with property (a) swrc:pages, (b) swrc:month, or (c) swrc:isbn.
This query tests FILTER expressions with varying selectivity. According to Table I, the FILTER expression in Q3a is not very selective (i.e. retains about 92.61% of all articles). Data access through a secondary index for Q3a is probably not very efficient, but might work well for Q3b, which selects only 0.65% of all articles. The filter condition in Q3c is never satisfied, as no articles have swrc:isbn predicates.
Q3a
SELECT ?article WHERE { ?article rdf:type bench:Article. ?article ?property ?value FILTER (?property=swrc:pages) }
Q3b
Select all articles with property swrc:month.
Like Q3a, but "swrc:month" instead of "swrc:pages"
Q3c
Select all articles with property swrc:isbn.
Like Q3a, but "swrc:isbn" instead of "swrc:pages"
Select all distinct pairs of article author names for authors that have published in the same journal.
Q4 contains a comparably long graph chain, i.e. variables ?name1 and ?name2 are linked through articles that (different) authors have published in the same journal. The query computes very large result sets.
SELECT DISTINCT ?name1 ?name2 WHERE { ?article1 rdf:type bench:Article. ?article2 rdf:type bench:Article. ?article1 dc:creator ?author1. ?author1 foaf:name ?name1. ?article2 dc:creator ?author2. ?author2 foaf:name ?name2. ?article1 swrc:journal ?journal. ?article2 swrc:journal ?journal FILTER (?name1 < ?name2) }
Return the names of all persons that occur as author of at least one inproceeding and at least one article.
Queries Q5a and Q5b test different variants of joins. Q5a implements an implicit join on author names, which is encoded in the FILTER condition, while Q5b explicitly joins the authors on variable ?person.
SELECT DISTINCT ?person ?name WHERE { ?article rdf:type bench:Article. ?article dc:creator ?person. ?inproc rdf:type bench:Inproceedings. ?inproc dc:creator ?person2. ?person foaf:name ?name. ?person2 foaf:name ?name2 FILTER(?name=?name2) }
Return the names of all persons that occur as author of at least one inproceeding and at least one article (same as (Q5a)).
Queries Q5a and Q5b test different variants of joins. Q5a implements an implicit join on author names, which is encoded in the FILTER condition, while Q5b explicitly joins the authors on variable ?person.
SELECT DISTINCT ?person ?name WHERE { ?article rdf:type bench:Article . ?article dc:creator ?person . ?inproc rdf:type bench:Inproceedings . ?inproc dc:creator ?person . ?person foaf:name ?name }
Return, for each year, the set of all publications authored by persons that have not published in years before.
Q6 implements negation, expressed through a combination of operators OPTIONAL, FILTER, and BOUND. The idea of the construction is that the block outside the OPTIONAL expression computes all publications, while the inner one constitutes earlier publications from authors that appear outside. The outer FILTER expression then retains publications for which ?author2 is unbound, i.e. exactly the publications of those authors that have not published in earlier years.
SELECT ?yr ?name ?doc WHERE { ?class rdfs:subClassOf foaf:Document. ?doc rdf:type ?class. ?doc dcterms:issued ?yr. ?doc dc:creator ?author. ?author foaf:name ?name OPTIONAL { ?class2 rdfs:subClassOf foaf:Document. ?doc2 rdf:type ?class2. ?doc2 dcterms:issued ?yr2. ?doc2 dc:creator ?author2 FILTER (?author=?author2 && ?yr2 < ?yr) } FILTER (!bound(?author2)) }
Return the titles of all papers that have been cited at least once, but not by any paper that has not been cited itself. This query implements double negation.
SELECT DISTINCT ?title WHERE { ?class rdfs:subClassOf foaf:Document. ?doc rdf:type ?class. ?doc dc:title ?title. ?bag2 ?member2 ?doc. ?doc2 dcterms:references ?bag2 OPTIONAL { ?class3 rdfs:subClassOf foaf:Document. ?doc3 rdf:type ?class3. ?doc3 dcterms:references ?bag3. ?bag3 ?member3 ?doc OPTIONAL { ?class4 rdfs:subClassOf foaf:Document. ?doc4 rdf:type ?class4. ?doc4 dcterms:references ?bag4. ?bag4 ?member4 ?doc3 } FILTER (!bound(?doc4)) } FILTER (!bound(?doc3)) }
Compute authors that have published with Paul Erdoes or with an author that has published with Paul Erdoes.
SELECT DISTINCT ?name WHERE { ?erdoes rdf:type foaf:Person. ?erdoes foaf:name "Paul Erdoes"ˆˆxsd:string. { ?doc dc:creator ?erdoes. ?doc dc:creator ?author. ?doc2 dc:creator ?author. ?doc2 dc:creator ?author2. ?author2 foaf:name ?name FILTER (?author != ?erdoes && ?doc2 != ?doc && ?author2 != ?erdoes && ?author2 != ?author) } UNION { ?doc dc:creator ?erdoes. ?doc dc:creator ?author. ?author foaf:name ?name FILTER (?author!=?erdoes) } }
Return incoming and outgoing properties of persons.
SELECT DISTINCT ?predicate WHERE { { ?person rdf:type foaf:Person. ?subject ?predicate ?person } UNION { ?person rdf:type foaf:Person. ?person ?predicate ?object } }
Return all subjects that stand in any relation to person “Paul Erdoes”. In our scenario the query can be reformulated as Return publications and venues in which “Paul Erdoes” is involved either as author or as editor.
SELECT ?subj ?pred WHERE { ?subj ?pred person:Paul_Erdoes }
Return (up to) 10 electronic edition URLs starting from the 51th publication, in lexicographical order.
SELECT ?ee Q11 WHERE { ?publication rdfs:seeAlso ?ee } ORDER BY ?ee LIMIT 10 OFFSET 50
Q12a
Return yes if a person is an author of at least one inproceeding and article;
Q12a is the Q5a as ASK query
Q12b
Return yes if an author has published with Paul Erdoes or with an author that has published with “Paul Erdoes”;
Q12b is the Q8 as ASK query
Q12c
Asks for a single triple that is not present in the database.
ASK {person:John_Q_Public rfd:type foaf:Person}
The SP2Bench benchmark clearly demonstrates that NitrosBase Universal DBMS is tens or hundreds of times faster than a well-known Graph DBMS on most queries. In the worst case (query 3b), it is at least 8 times faster. In the best case (query 11) it is 300 000 times faster.