OntoBase is a new core for data storage systems that spans many storage system features and is an essential part of the Ontologic File System (OntoFS) component.
OntoBase is the result of a deep research into data storage and retrieval systems that comprised the analysis of around 60
- Resource Description Framework (RDF),
- object-oriented, and
stores and whole Data Base Management Systems (DBMSs), as well as related basic
- hardware techniques,
- software techniques,
- data structures, and
As one result we found out that no existing storage system fulfills our requirements and that due to complexity a selection of a specific storage system is virtually based on rules of thumb only. Indeed, we observered once again that the selection problem in this field of data storage and retrieval systems is very similar to the decision problems in the fields of Product Lifecycle Management (PLM) and Computer Aided Engineering (CAE), which are tried to be supported with Know- ledge-Based Engineering (KBE).
OntoBase is also a result of our research and development activities in the OntoLab that led to the:
by their integration to a log-structured, hash-based, row- and column-oriented data storage and retrieval system.
Log-Structured, Hash-based, Row- and Column-oriented
Also at that time, we have begun already with the rework respect- ively the extension and the adaption of one of the related graph- oriented techniques called parallel sliding windows, that is only a general and simple way to partition, process, and compute on a graph, so that it can be integrated with our log-structured, hash- based, and row- and column-oriented file database system, which is a product of the already given integration of our Reflection DB and our Log-Structured Hash-Based File System. In a first step, this graph-oriented technique has to be extended in such a way that it features the functionality of for example our Reflection DB, which means to add some additional indices to the basic data structure missed by the developers in the last two years and to update the related functions in such a way that the technique can handle in- coming edges (in-edges) and outgoing edges (out-edges) of graph vertices efficiently at the same time. In a second step, the implem- entation of the technique has to be harmonized with the basic data structures and algorithms of our Log-Structured Hash-Based File System. After putting all together we get a log-structured, hash- based, row- and column-oriented data storage system in the end, that can handle graphs now, which represent the files and the dir- ectories, but also the functions, data, and metadata if useful, and constitutes one of the foundations of our OntoFS.
Based on Ontology
Instead of implementing a specific data storage and retrieval sys- tem, as it is common in the fields of relational and NoSQL databases for example, we followed our initial plan that included the definition of a related ontology and the usage of Knowledge-Based Software Engineering (KBSE), which is based on the SoftBionic (SB) functionalities of our OntoBot and OntoBlender, comprising Artifical Intelligence (AI) and Machine Learning (ML) capabilities for example, for the design of our OntoFS, its dynamic refinement and management, as well as the Create, Read, Update, and Destroy (CRUD) procedures done with its data stores at run-time.
In this way our Ontologic Systems (OSs) give a user or a machine exactly the right data storage and retrieval systems, which are based on the:
- different variants of
- volatile memory and
- Non-Volatile Memory (NVM),
on the side of the hardware,
- various versions of
- hash table,
- eXtensible Array (XArray),
- B*-tree, and
- Log-Structured Merge (LSM) tree (LSM-tree),
- k-dimensional binary tree (k-d tree or kd-tree),
- k-dimensional B-tree (k-d-B-tree or kdb-tree),
- k-dimensional B+-tree (kdB+-tree),
- k-dimensional B*-tree (kdB*-tree), and
- k-dimensional Be-tree (kdBe-tree or kde-tree),
- further suitable data structures, and
- polymorphic combinations
on the side of the software, and
- requirements like
- memory mapping,
- shadow paging,
- in-memory execution,
- fractional cascading
- concurrency (inclusive MultiVersion Concurrency Control (MVCC)),
- Compare-And-Swap (CAS),
- non-blocking and locking,
- vertical partitioning,
- horizontal partitioning/sharding,
- transaction processing (Atomicity, Consistency, Isolation, and Durability (ACID)),
- logging (inclusive Write-Ahead Logging (WAL)),
- Copy-On-Write (COW),
- soft update,
- fault tolerance and high-availability,
- query language,
- extension language,
- access control even of each single store cell,
- run-time interchangeable data management engine,
- and others
on the lowest level, and comprise
- object-oriented, and
- graph (inclusive hypergraph)
- document, and
stores to special
on the highest level, and
- deductive databases,
- OnLine Analytical Processing (OLAP), and
- OnLine Transaction Processing (OLTP)
on the application side.
is also supported by a dynamic disk partition manager and process communication infrastructures to handle specific de- mands of a data store.
Actually, the prototype of OntoFS consists of building blocks that feature elements of:
polymorphic key-value stores (hash and b+ tree)
multi-model, tuple and graph stores
semi-polymorphic multi-model, key-value and document stores (hash and b+ tree)
multi-model, key-value, document, and graph stores
column, relational stores
row, relational stores
row, column stores
and much more.
- Featherstitch File System,
- Filesystem in Userspace (FUSE),
- Cooperative File System (CFS) (distributed consistend hash with Chord) [PDF],
- Magma (distributed hash on FUSE),
- Log-structured Hash-based File System (LogHashFS or LHFS),
- redisfs (Redis on FUSE),
- Libsqlfs (SQLite3 on FUSE),
As it can be seen easily by the list given above, to select a combin- ation of building blocks for a data storage system is a highly complex task, which can be done by hand or by our proposed and highly re- commended support by techniques of the fields of Artificial Intellig- ence (AI) with its branches Machine Learning (ML) and Knowledge Engineering (KE), Evolutionary Algorithms (EA), specifically its branch Genetic Programming (GP), and Computational Creativity. Luckily, OntoLix and OntoLinux are Ontologic Systems (OSs), specifically reflective Hightech Operating Systems (HOSs), and feature the OntoBase, OntoBot, and OntoBlender software components.
Besides the support for basic graph processing and analytics OntoBase supports virtually every other field of application as well.
In comparison to the Web Ontology Language our Ontologic Web Language (OWL) offers direct language support for n-ary relationships, which can be handled by the hypergraph feature of the OntoBase and the directly connected features of pattern recognition and querying, and term and graph rewritting of the OntoBot and OntoBlender components.
Further supported fields are Machine Learning (ML), Data Mining (DM), and Natural Language Processing (NLP) with vector space modeling and topic modeling with our OntoBlender and its features of term frequency-inverse document frequency (tf-idf), Locality-Sensitive Hashing (LSH; random projection), Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), and more, inclusive their distributed parallel versions.