RDF DATABASES – CASE STUDY AND PERFORMANCE EVALUATION
DOI:
https://doi.org/10.20319/mijst.2019.53.0114Keywords:
RDF, Database, noSQL, Benchmarking, Big Data, Query PerformanceAbstract
The Resource Description Framework (RDF) data presentation model and the SPARQL query language have been the core of the semantic web technologies since the early 2000’s. In this article, we evaluate three RDF storage technologies. Our motivation is to find a storage solution that can be used to process “big data” RDF sets. Our method is based on measuring query response times with large samples (hundreds of thousands of RDF documents, millions of RDF statements). We find that all the proposed technologies provide much better performance than querying RDF data stored in files. However, with 300 000 documents, even with the fastest technology, an aggregation query still lasts more than 100 seconds in our environment. As a further performance improvement, we test the same data and queries with MongoDB, demonstrate its performance (10 seconds instead of 100) and scalability (up to 1000 000 documents). However, despite its benefits we must note that because of its data presentation and query limitations, MongoDB probably cannot serve as a generic storage for all kinds of RDF documents.
References
Agrawal, D., El Abbadi, A., Das, S., & Elmore, A. J. (2011). Database scalability, elasticity, and autonomy in the cloud. International Conference on Database Systems for Advanced Applications (pp. 2-15). Springer. https://doi.org/10.1007/978-3-642-20149-3_2
Arenas, M., Gutierrez, C., & Pérez, J. (2009). Foundations of RDF databases. Reasoning Web International Summer School (pp. 158-204). Heidelberg: Springer. https://doi.org/10.1007/978-3-642-03754-2_4
Banker, K. (2011). MongoDB in action. Manning Publications.
Becker, C. (2008). RDF Store Benchmarks with DBpedia. Berlin: Freie Universitat Berlin.
Botoeva, E., Calvanese, D., Cogrel, B., & Xiao, G. (2018). Expressivity and complexity of MongoDB queries. 21st International Conference on Database Theory. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. https://doi.org/10.3233/IA-190023
Broekstra, J., Kampman, A., & Van Harmelen, F. (2002). Sesame: A generic architecture for storing and querying RDF and RDF schema. Proc. 1st International semantic web conference (pp. 54-68). Sardinia: Springer. https://doi.org/10.1007/3-540-48005-6_7
Donohoe, P., Sherman, J., & Mistry, A. (2015). The Long Road to JATS. Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015.
Faye, D. C., & Curé, O. B. (2012). A survey of RDF storage approaches. Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées, 15, (pp 11-35).
Hartig, O., & Pérez, J. (2017). An initial analysis of Facebook’s GraphQL language. AMW 2017 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web. Montevideo.
Levandoski, J. J., & Mokbel, M. F. (2009). RDF data-centric storage. 2009 IEEE International Conference on Web Services (pp. 911-918). IEEE. https://doi.org/10.1109/ICWS.2009.49
Miller, L., Seaborne, A., & Reggior, A. (2002). Three implementations of SquishQL, a simple RDF query language. Proc. International Semantic Web Conference. Heidelberg, Germany. https://doi.org/10.1007/3-540-48005-6_36
Morsey, M., Lehmann, J., Auer, S., & Ngomo, A. (2009). DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data. Proc. International semantic web conference 2011, (pp. 1-24). Springer
Niinimaki, M., & Niemi, T. (2009). An ETL process for OLAP using RDF/OWL ontologies. Journal of Data Semantics, XIII, 97-119. https://doi.org/10.1007/978-3-642-03098-7_4
Niinimaki, M., & Thanisch, P. (2019). Dataspace Management for Large Data Sets. In P. Vasant, I. Litvinchev, & Marmolejo-Saucedo. J., Innovative Computing Trends and Applications (pp. 13-21). Springer. https://doi.org/10.1007/978-3-030-03898-4_2
Niinimaki, M., Heikkurinen, M., & Schmidt, J. (2019). Performance of XML databases., forthcoming.
Oracle. (2016). Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph. Oracle.
Robinson, I., Webber, J., & Eifrem, E. (2015). Graph Databases (2nd ed.). Sebastopol, CA: O'Reilly.
Schmidt, M., Schallhorn, T., Lausen, G., & Pinkel, C. (2009). SP2Bench: A SPARQL performance benchmark. IEEE International Conference on Data Engineering, 42. https://doi.org/10.1109/ICDE.2009.28
Steinbrook, R. (2005, April). Public Access to NIH-Funded Research. New England Journal of Medicine(352), 1739-1741. https://doi.org/10.1056/NEJMp058088
Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., & Wilkins, D. (2010). A comparison of a graph database and a relational database: a data provenance perspective. Proceedings of the 48th annual Southeast regional conference. ACM. https://doi.org/10.1145/1900008.1900067
W3C. (2004). RDF Primer - W3C Recommendation.
W3C. (2008). SPARQL Query Language for RDF, W3C Recommendation.
W3C. (2014). RDF 1.1 N-Triples, A line-based syntax for an RDF graph. Retrieved from https://www.w3.org/TR/n-triples/
Downloads
Published
How to Cite
Issue
Section
License
Copyright of Published Articles
Author(s) retain the article copyright and publishing rights without any restrictions.
All published work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.