From SPIN to SHACL

In July 2017, the W3C has ratified the Shapes Constraint Language (SHACL) as an official W3C Recommendation. SHACL was strongly influenced by SPIN and can be regarded as its legitimate successor. This document explains how the two languages relate and shows how basically every SPIN feature has a direct equivalent in SHACL, while SHACL improves over the features explored by SPIN.

History

SPIN started to evolve as a feature in TopQuadrant's TopBraid Composer product around 2008. The driving observation behind SPIN was that SPARQL queries can be stored together with RDF data models in RDF graphs to define executable semantics of classes and their members. In particular, SPIN included the properties spin:constraint (to link a class with constraint checks that all instances of the class need to fulfill) and spin:rule (to link a class with inferencing rules that construct new information from the statements about the instances). These simple concepts turned out to be surprisingly useful for a number of practical problem areas, and filled some gaps that were not addressed by the other semantic web standard in their time.

Over the years, SPIN was extended into a more powerful ecosystem and included the ability to encapsulate SPARQL queries so that instead of having to write SPARQL queries, users could simply instantiate a so-called template in a higher-level RDF vocabulary. This encapsulation mechanism was also introduced as a mechanism to define new SPARQL functions, allowing users to assemble SPARQL queries out of reusable building blocks. In 2011 SPIN was submitted as a W3C Member Submission, which was a way of signaling to the W3C that there is industry interest in the general problem areas covered by SPIN, including the ability to perform closed-world constraint checks on RDF-based data.

In 2014, the W3C formed the RDF Data Shapes Working Group which was chartered to develop a new language for expressing constraints on RDF data. Like all W3C working groups, members from a wide variety of interested organizations joined, each with individual requirements, preferences and proposed approaches. SPIN was taken as one of the inputs into this standardization process alongside with other W3C submissions. The design philosophy behind the new standard, called SHACL, was to provide a high-level vocabulary called SHACL Core and an extension mechanism that allows to associate SPARQL queries with classes and other resources, similar to how SPIN worked.

The SHACL standard is primarily specified using the SHACL W3C Recommendation document which covers both the SHACL Core and the SHACL-SPARQL extension mechanism. This document covers what the working group had been chartered to deliver. Several working group members however were also interested in producing additional documents, both heavily inspired by SPIN:

These two documents were also published as a result of the SHACL working group, but do not have the same official status as the main specification. This was largely due to time constraints (the WG was about to expire before all work could be finished) but also because some features such as rules and the JavaScript extensions were out of scope for the official charter of the WG. Although the features defined by the WG notes do not have the same visibility as the main SHACL features, there are already implementations available including the open source projects SHACL API based on Apache Jena and SHACL-JS API in JavaScript. The plan is that these documents may become the backbone of official standards in a future working group.

Feature Comparison

The following sub-sections walk through the main feature areas of SPIN and how they compare to corresponding SHACL features. This table summarizes the equivalences:

SPIN Feature SHACL Feature
Constraints in SPARQL (spin:constraint, sp:Construct) Shapes with sh:sparql
Constraints in high-level vocabulary (spl:Attribute, spl:minCount etc) Shapes with SHACL Core properties (sh:property, sh:minCount etc)
Inference rules in SPARQL (spin:rule, sp:Construct) Inference rules in SPARQL (sh:rule, sh:construct)
Inference rules in high-level vocabulary (spin:rule, sp:ConstructTemplate) Inference rules with Node Expressions (sh:rule, sh:path, sh:filterShape etc)
User-defined SPARQL functions (spin:Function, spin:body) User-defined SPARQL functions (sh:Function, sh:select)
Magic Properties (spin:MagicProperty) approx: Triple rules (sh:rule, sh:predicate etc)
JavaScript Support (SPINx) (spinx:javaScriptCode) User-defined JavaScript functions (sh:Function, sh:jsFunctionName)
RDF Syntax of SPARQL queries (sp:Select, sp:Filter etc) Limited to text strings and prefixes (sh:select, sh:prefixes etc)

Constraints in SPARQL

In SPIN, constraints are attached to classes using the property spin:constraint. The values of this property are either SPARQL queries or suitable query templates. This example states that each instance of the class parent must be at least 18 years old, using a SPARQL constraint:

ex:Parent
	a rdfs:Class ;
	rdfs:label "Parent" ;
	rdfs:subClassOf ex:Person ;
	spin:constraint
		[ a sp:Ask ;
			sp:text """
				# must be at least 18 years old
				ASK WHERE {
					?this ex:age ?age .
					FILTER (?age < 18) .
				}"""
		] .

A direct translation of this into SHACL-SPARQL would look like this:

ex:Parent
	a rdfs:Class, sh:NodeShape ;
	rdfs:label "Parent" ;
	rdfs:subClassOf ex:Person ;
	sh:sparql
		[  sh:message "Must be at least 18 years old, but is {?age}" ;
			sh:prefixes ex: ;
			sh:select """
				SELECT $this ?age
				WHERE {
					$this ex:age ?age .
					FILTER (?age < 18) .
				}"""
		] .

Apart from using different properties, the translation from SPIN to SHACL is straight-forward. Note that SHACL recommends the use of $ instead of ? for variables that are pre-bound when the query executes, such as $this which is the currently validated instance.

The design of SHACL includes lessons-learned from SPIN. For example, while SPIN requires the use of rather awkward CONSTRUCT queries to produce more than simple true/false responses, SHACL uses simpler SELECT queries that may return extra variables which are then picked up for message generation using sh:message. For scenarios where SPIN CONSTRUCTs would produce additional triples, SHACL's Advanced Features include so-called Annotation Properties.

Constraints in High-Level Vocabulary

Since not everyone wants to express constraints in SPARQL, SPIN includes a facility called Templates which can be used to encapsulate SPARQL-based logic into parameterizable building blocks. As a popular example, the SPIN standard library includes the template spl:Attribute that can be used as in the following example:

ex:Person
	a rdfs:Class ;
	spin:constraint [
		a spl:Attribute ;
		spl:predicate ex:dateOfBirth ;
		spl:maxCount 1 ;
		spl:valueType xsd:date ;
	] .

SHACL uses almost identical syntax, just with different properties:

ex:Person
	a rdfs:Class, sh:NodeShape ;
	sh:property [
		sh:path ex:dateOfBirth ;
		sh:maxCount 1 ;
		sh:datatype xsd:date ;
	] .

In both languages, the high-level vocabulary such as spl:maxCount and sh:maxCount is backed by SPARQL queries that implement executable semantics. This means that in both languages, the constraint vocabulary is self-descriptive and therefore consistently extensible. The equivalent of SPIN templates in SHACL is called Constraint Components. Constraint components are a bit more flexible in that it is possible to mix multiple constraint types (such as sh:minCount and sh:minLength into the same shape definition, while SPIN templates always require new instances and multiple spin:constraint triples.

SHACL offers much greater flexibility w.r.t. the application target of constraints (and rules). While SPIN is limited to classes, SHACL shapes can be applied to either classes or sets of nodes derived by other means. See the various target mechanisms in SHACL, including the custom targets from the Advanced Features document. Note however that unless you have an explicit sh:targetClass triple in your shape definition and you want to apply it to a class, that class needs to be an instance of sh:NodeShape, too. Hint: you can declare every class to be an instance of sh:NodeShape by adding a triple rdfs:Class rdfs:subClassOf sh:NodeShape into your shapes graph.

Inference Rules in SPARQL

SPIN inferencing rules based on SPARQL CONSTRUCT queries have a direct translation into SHACL, too. The following example SPIN rule infers the grand parents of a person:

ex:Person
	a rdfs:Class ;
	rdfs:label "Person" ;
	rdfs:subClassOf owl:Thing ;
	spin:rule
		[  a sp:Construct ;
			sp:text """
				CONSTRUCT {
					?this ex:grandParent ?grandParent .
				}
				WHERE {
					?parent ex:child ?this .
					?grandParent ex:child ?parent .
				}"""
		] .

The SHACL Advanced Features document defines the syntax for SPARQL Rules:

ex:Person
	a rdfs:Class, sh:NodeShape ;
	rdfs:label "Person" ;
	rdfs:subClassOf owl:Thing ;
	sh:rule
		[  a sh:SPARQLRule ;
			sh:prefixes ex: ;
			sh:construct """
				CONSTRUCT {
					$this ex:grandParent ?grandParent .
				}
				WHERE {
					?parent ex:child $this .
					?grandParent ex:child ?parent .
				}"""
		] .

SHACL rules provide a bit more flexibility in that they can be combined with shape-based preconditions, see sh:condition. Like constraints, SHACL rules also have a more powerful targeting mechanism, not limited to classes like SPIN.

Note that the current version of the SHACL Advanced Features document only defines how to do a single iteration of rules. This was primarily done due to time constraints in the working group, avoiding potential complications such as infinite loops. Hopefully, future versions of this specification will address this gap in a more flexible way. If you need rules that depend on each other, one solution is to put them into an order, using sh:order. Or, assume that most implementations will do the obvious thing and implement iterations as an option.

Inference Rules in High-Level Vocabulary

The values of spin:rule can also be instances of a SPIN template, making it possible to define rules using higher-level vocabularies than SPARQL queries. A good example of this is the SPINMap framework which enables users to define mappings between classes using a graphical notation. Each transformation becomes a SPIN rule, and may use function calls to modify data on the fly, for example to translate xsd:string literals into xsd:date literals.

SHACL does not have the exact same mechanism, but arguably something better: SHACL Node Expressions (Advanced Features). That syntax covers similar expressivity as SPARQL without being SPARQL. Node expressions form evaluation chains, where the output of one expression is used as input to the next. For example, an equivalent node expression for the "age less than 18" query above would be

	[
		shf:lessThan ( [ sh:path my:age ] 18 )
	]

which means that the SHACL engine will first fetch the value(s) of my:age and then apply them as first argument to the function called shf:lessThan. In the example above, shf:lessThan is assumed to be a generic SHACL function (with implementations in SPARQL, JavaScript or any other target language), abstracting the < operator into a reusable building block.

This syntax has been particularly designed for the representation of rules for data transformation tasks and is amenable to visual editing tools such as mapping diagrams. Node expressions can be used in either subject, predicate or object position of SHACL Triple Rules. The following example is a rewrite of the "grand parent" example from above, but using Triple Rules:

ex:Person
	a rdfs:Class, sh:NodeShape ;
	rdfs:label "Person" ;
	rdfs:subClassOf owl:Thing ;
	sh:rule
		[  a sh:TripleRule ;
			sh:subject sh:this ;
			sh:predicate ex:grandParent ;
			sh:object [
				sh:nodes [
					sh:path [ sh:inversePath ex:child ] ;
				] ;
				sh:path [ sh:inversePath ex:child ] ;
			] ;
		] .

Above, the rule will infer the values of ex:grandParent for any instance of ex:Person. As the first step it matches the values of ex:child (in the inverse direction) at the person. As the second step it takes the output from the first step as the subjects of triple matches with ex:child as predicate (again in the inverse direction), returning a stream of nodes which then become objects of the triples that have the original person as subject and ex:grandParent as predicate. The Node Expressions framework is quite powerful and is based on years of experience with SPIN and SPINMap in particular. Note however that Node Expressions do not have a concept of variables as SPARQL does, so that in some cases it will need to be combined with SPARQL-based SHACL rules.

User-Defined (SPARQL) Functions

SPIN functions encapsulate a snippet of SPARQL code into a reusable building block so that it can be called like any other SPARQL function in SPARQL FILTER and BIND expressions. This framework has proven to be extremely important for everyday work, because it makes queries much more maintainable and less redundant.

The following example define a new SPIN function ex:cardinality that gets the number of values of a given predicate at a given subject:

ex:cardinality
	a spin:Function ;
	rdfs:subClassOf 	spin:Functions ;
	rdfs:comment "Gets the number of values of a given property at a given subject." ;
	rdfs:label "cardinality"^^xsd:string ;
	spin:returnType xsd:integer ;
	spin:constraint [
		a spl:Argument ;
		rdfs:comment "The subject to get the cardinality at." ;
		spl:predicate sp:arg1 ;
	] ;
	spin:constraint [
		a spl:Argument ;
		rdfs:comment "The property to get the cardinality of." ;
		spl:predicate sp:arg2 ;
	] ;
	spin:body [
		a sp:Select ;
		sp:text """
			SELECT (COUNT(?object) AS ?result)
			WHERE {
				?arg1 ?arg2 ?object .
			}"""
	] .

SHACL-SPARQL includes an almost identical feature in the Advanced Features document. The example above would become:

ex:cardinality
	a sh:SPARQLFunction ;
	rdfs:subClassOf 	spin:Functions ;
	rdfs:comment "Gets the number of values of a given property at a given subject." ;
	rdfs:label "cardinality"^^xsd:string ;
	sh:returnType xsd:integer ;
	sh:parameter [
		sh:description "The subject to get the cardinality at." ;
		sh:path ex:arg1 ;
	] ;
	sh:parameter [
		sh:description "The property to get the cardinality of." ;
		sh:path ex:arg2 ;
	] ;
	sh:prefixes ex: ;
	sh:select """
		SELECT (COUNT(?object) AS ?result)
		WHERE {
			$arg1 $arg2 ?object .
		}""" .

Magic Properties

SPIN includes a generalization of the SPIN functions framework, for cases in which a function needs to return more than one value (row) or more than one variable (column). These so-called magic properties rely on a feature that is supported by many SPARQL implementations without being part of the official SPARQL standard, namely the ability to treat certain predicates in a SPARQL WHERE clause so that their values are computed at query execution time.

This situation made it difficult for SPIN's magic properties to be promoted into SHACL. The closest thing that SHACL offers is inferencing, in particular with Triple Rules. In a nutshell, if you need to return multiple values for a magic predicate, define a triple rule that has sh:this as sh:subject and the given predicate as sh:predicate, while the sh:object is a Node Expression that delivers the actual values. Based on this information, tools can automatically compute the inferred values, for example as magic properties in SPARQL queries, or (as implemented by TopBraid's web products) as dynamically computed properties on display forms.

There is no equivalent of magic SPIN properties that return multiple columns in SHACL. In these cases, the magic property would need to be broken up into multiple individual properties.

JavaScript Support

Less visible and less known than its other features, SPIN also includes a vocabulary for declaring new SPARQL functions that are backed by JavaScript code. This so-called SPINx framework is a highly flexible mechanism to perform almost arbitrary computations of values, based on whatever can be implemented by JavaScript libraries.

The SPINx framework inspired a whole new specification, the SHACL JavaScript Extensions, which were published by the SHACL WG as a Note similar to the Advanced Features document. SHACL-JS is far more powerful than SPINx as it is basically a complete alternative to SHACL-SPARQL and allows users to express constraints, constraint components, rules, and custom targets backed by JavaScript logic. As a result of this framework, it is possible to write SHACL code that can either be executed by a SPARQL processor or a JavaScript engine (e.g. client-side or using Node.js). An open source SHACL-JS API provides the backbone of an online SHACL Playground, all implemented in JavaScript.

An overview of how SHACL-JS relates to the other components of SHACL can be found in the SHACL Features article.

RDF Syntax of SPARQL Queries

SPIN includes two ways of representing SPARQL queries in RDF. The most comprehensive variant breaks the structure of a SPARQL query into individual syntax tree elements, as a deep RDF structure. For example, the SPARQL query

	# must be at least 18 years old
	ASK WHERE {
		?this my:age ?age .
		FILTER (?age < 18) .
	}

can be represented in SPIN RDF syntax as

	[ 	a sp:Ask ;
	  	rdfs:comment "must be at least 18 years old" ;
		sp:where (
			[  sp:object sp:_age ;
				sp:predicate my:age ;
				sp:subject spin:_this
			]
			[  a sp:Filter ;
				sp:expression
				[ 	sp:arg1 sp:_age ;
					sp:arg2 18 ;
					a sp:lt
				]
			])
	]

This syntax has the notable advantages that the query structure itself becomes a data structure that can be programmatically queried or generated. The obvious downside of this syntax is that it is hard to read or enter into textual RDF files for humans. SPIN therefore includes a simpler variation based on text strings:

	[ a sp:Ask ;
		sp:text """
			# must be at least 18 years old
			ASK WHERE {
				?this my:age ?age .
				FILTER (?age < 18) .
			}"""
	]

The SHACL equivalent of this would be the following:

	[
		sh:ask """
			# must be at least 18 years old
			ASK WHERE {
				?this my:age ?age .
				FILTER (?age < 18) .
			}""" ;
		sh:prefixes my: ;
	]

SHACL cleans up a rather murky hack used by SPIN related to namespace prefixes. With sp:text the assumption was that the queries can only be parsed in the graph they are defined in, and the required prefix declarations are not really part of the RDF graph standard (although most implementations maintain them). Instead, SHACL introduces prefix management vocabulary, which may also be useful for other use cases beside shapes.

SHACL does not support a deeply structured SPARQL RDF syntax like SPIN, if only because it would have been too tedious and time consuming to define all its details together with test cases etc in the context of a working group that included many members that were not keen on the SPARQL features overall. W3C politics require compromises on all sides. Having less features also lowers the cost for implementers.

See the section on Inferencing Rules in High-Level Vocabulary for a related SHACL alternative to the SPIN SPARQL Syntax.

Outlook

Although SPIN has been created by TopQuadrant, it has since been adopted by various other groups and products. This section only reflects on TopQuadrant's view point and expectations.

SHACL supersedes SPIN in almost every respect. As outlined in this document, SHACL includes basically all features of SPIN, and more. Most importantly, SHACL is an official W3C Recommendation, which makes it far more likely that other vendors will support it. As a result of this, TopQuadrant will prefer SHACL over SPIN for new features. SPIN will remain supported indefinitely, as there is a lot of production code that heavily uses SPIN for various purposes.


Holger Knublauch, last updated 2017-08-01