Parallel Retrieval
How can we improve performance of an application dynamically retrieving Linked Data?
Context
An application that draws on data from the web may typically be retrieving a number of different resources. This is especially true if using the Follow Your Nose
pattern to discover data.
Solution
Use several workers to make parallel GET requests, with each work writing into a shared RDF graph.
Example(s)
Most good HTTP client libraries will support parallelization of HTTP requests. E.g. PHP's curl_multi
or Ruby's typhoeus
library.
Discussion
Parallelization of HTTP requests can greatly reduce retrieval times, e.g. to time of the single longest GET request.
By combining this approach with Resource Caching
of the individual responses, an application can maintain a local cache of the most requested data, which are then combined and parsed into a single RDF graph for driving application behavior.
In practice though there is no real difference between the structures as data cannot be easily merged into an existing sequence, e.g. to append values. Both also suffer from being poorly handled in SPARQL 1.0 (there is no way to query or construct an arbitrary sized list or sequence without extensions); SPARQL 1.1 property paths remedy the querying aspects.