Using the API with R

statistics.gov.scot provides highly flexible programmatic access to its data, through the SPARQL endpoint. This API can be used to automate report-writing, publish data visualisations, or interactive tools for exploring the data. This guide describes how to pull data from statistics.gov.scot into the "R" Statistical Programming environment.

This section assumes a basic understanding of manipulating data in R and RStudio, and we'll be using the RStudio integrated development environment.

For this section, we will use R to make an API call to statistics.gov.scot, and load the data into a dataframe. From this point, it is then possible to take advantage of R’s many functions and libraries, even creating interactive tools with Shiny.

To do this, we will use the SPARQL query that we developed in the SPARQL user guide to extract the percentage of children in P1 with no obvious dental decay by Health Board Area:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?areaname ?periodname ?value 
WHERE {
  ?obs <http://purl.org/linked-data/cube#dataSet> <http://statistics.gov.scot/data/child-dental-health> .
  ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?areauri .
  ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?perioduri .
  ?obs <http://statistics.gov.scot/def/measure-properties/ratio> ?value .
  ?areauri rdfs:label ?areaname .
  ?perioduri rdfs:label ?periodname .
}

We can paste this directly into the SPARQL endpoint, run it, and we get this:

To take this into R, we could simply download the CSV file, and read it into our project. Taking advantage of the API allows us to always be pulling in the most up-to-date information; the data on statistics.gov.scot is regularly updated.

To handle SPARQL APIs like the one on statistics.gov.scot, there is a library available to import, called SPARQL.

The first thing to do, then, is to install the appropriate package from CRAN. Into your RStudio console, type:

install.packages("SPARQL")

And then enable the package in your current project:

library(SPARQL)

The SPARQL package is now available for use in the project. To use the SPARQL function, we simply need to pass in two parameters - the web address of the endpoint, and the SPARQL query itself. The best way to do this is to create two variables, and assign the information into them:

The endpoint variable is straightforward. This will be the same for each call we make against statistics.gov.scot:

endpoint <- "https://statistics.gov.scot/sparql"

Then we need to assign the query string into the query variable, we do this:

query <- "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?areaname ?periodname ?value 
WHERE {
  ?obs <http://purl.org/linked-data/cube#dataSet> <http://statistics.gov.scot/data/child-dental-health> .
  ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?areauri .
  ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?perioduri .
  ?obs <http://statistics.gov.scot/def/measure-properties/ratio> ?value .
  ?areauri rdfs:label ?areaname .
  ?perioduri rdfs:label ?periodname .
}"

This means we now have our endpoint and query string assigned to variables, so we can simply use the SPARQL function to get the data and load it into a dataframe:

scotqueryres <- SPARQL(endpoint,query)

Running this can take a few seconds, depending on the complexity of the query. 'scotqueryres' Is a list, containing two items - a dataframe called results, and an empty object called namespaces. To work with the data, we want to isolate the dataframe. To do this, we just need to run this line:

scotdf <- scotqueryres$results

We can now use this data frame as we would any other in R - combine with other datasets, map the data, or create an interactive data explorer.

To continue exploring our datasets, return to statistics.gov.scot

Still need help? Contact Us Contact Us