Quantcast
Channel: Steve Hilker's Groups Activities
Viewing all articles
Browse latest Browse all 318

Database Discussions: October 2016

$
0
0

Welcome to the inaugural "Database Discussion" round table.  

Each month we'll post a thought-provoking question to several of our Toad World experts.  Below are their thoughts, but, we'd like your opinions, too!  Join the conversation.

And, if you have any questions on the subject from your experience please post them to be answered by the expert panel of contributors.

We'll even entice you to share [:)]. 

Each month we'll randomly select one response to win a $25 Amazon gift card. 

Speak up - don't be silent!

 


 

Do you think a NoSQL/NonSQL database such as MongoDB  will ever be able to replace Oracle?
 

Deepak Vohra

Deepak Vohra

Firstly, NoSQL database does not imply no SQL at all and for the same reason “NoSQL” is also referred to as "not only SQL". And, most NoSQL databases provide a SQL-like interface. What is different about NoSQL databases is that they are based on a database model that is non-relational and schema-free. In contrast relational databases such as Oracle database store data in tables with a fixed schema (rows/columns), which have relations between them and make use of SQL (Structured Query Language) to access and query the tables. A relational database table has a fixed schema with predefined columns and column types. Schema-less (schema-free) implies that each row of data in a NoSQL database could have a different (or same) set of columns and the column data type is not fixed. Some of the NoSQL supported data models are document store (for example MongoDB and Couchbase), wide column store (for example Apache Cassandra), and Key Value store (for example Oracle NoSQL database).

What is the need for NoSQL?

Relational databases have been used for decades. What is the need for a new type of a database, the NoSQL database.

NoSQL databases were developed as a solution to the following requirements of modern web scale applications:

    • Increase in the volume of data stored about users and objects, also termed as big data.
    • Streaming data, for example the data generated by online transactions and user sessions.
    • Rate at which big data influx is increasing each year, exponentially.
    • Increase in the frequency at which the data is accessed.
    • Fluctuations in data usage.
    • Increased processing and performance required to handle big data.
    • Ultra-high availability.
    • The type of data is unstructured or semi--structured.

A subset of the afore-mentioned reasons is often referred to as the 3 Vs or 4 vs. of Big Data; Volume, Variety, Velocity and Veracity. SQL-based relational databases were not designed to handle the scalability, agility, and performance requirements of modern applications using real-time access and processing big data. While most RDBMS databases provide scalability and high availability as features, NoSQL databases provide higher levels of scalability and high availability. Big data is growing exponentially. Concurrent users have grown from a few hundred or thousand to several million for applications running on the web. It is not just that once big data has been stored new data is not added. It is not just that once a web application is being accessed by millions of users it shall continue to be accessed by as many users for a predictable period of time. The number of users could drop to a few thousand within a day or a few days. Relational database being based on single server architecture, a single database is a single point of failure (SPOF). For a highly available database, data must be distributed across a cluster of servers instead of relying on a single database. NoSQL databases provide the distributed, scalable architecture required for big data. "Distributed" implies that data in a NoSQL database is distributed across a cluster of servers. If one server becomes unavailable another server is used. The "distributed" feature is a provision and not a requirement for a NoSQL database. A small scale NoSQL database may consist of only one server.

The fixed schema data model used by relational databases makes it necessary to break data into small sets and store them in separate tables using table schemas. The process of decomposing large tables into smaller tables with relationships between tables is called database normalization. Normalized databases require table joins and complex queries to aggregate the required data. In contrast, the data models provided by NoSQL databases provide a denormalized database. Each document is complete unto itself and does not have any external references to other documents. Self-contained documents are easier to store, transfer, and query.

Advantages of NoSQL Databases

In this section we shall cover the advantages of NoSQL databases.

Scalability

NoSQL databases are easily scalable, which provides an elastic data model. Why is scalability important? Suppose you are running a database with a fixed capacity and the web site traffic fluctuates, sometimes rising much in excess of the capacity, sometimes falling below the capacity. A fixed capacity database won't be able to serve the requests of the load in excess of the capacity, and if the load is less than the capacity the capacity is not being utilized fully. Scalability is the ability to scale the capacity to the workload. Two kinds of scalability options are available: horizontal scalability and vertical scalability. With horizontal scalability or scaling-out, new servers/machines are added to the database cluster. With vertical scalability or scaling-up, the capacity of the same server or machine is increased. Vertical scalability has several limitations.

    • Requires the database to be shut down so that additional capacity may be added, which incurs a downtime.
    • A single server has an upper limit.
    • A single server is a single point of failure. If the single server fails, the database becomes unavailable.

While relational databases support vertical scalability, NoSQL databases support horizontal scalability. Horizontal scalability does not have the limitations that vertical scalability does. Additional server nodes may be added to a cluster without a dependency on the other nodes in the cluster. The capacity of the NoSQL database scales linearly, which implies that if you add 4 additional servers to a single server, the total capacity becomes five times the original, not a fraction of the original due to performance loss. The NoSQL cluster does not have to be shut down to add new servers. Ease of scalability is provided by the shared-nothing architecture of NoSQL databases. The monolithic architecture provided by traditional SQL databases is not suitable for the flexible requirements of storing and processing big data. Traditional databases support scale-up architecture (vertical scaling) in which additional resources may be added to a single machine. In contrast, NoSQL databases provide a scale-out (horizontal scaling), nothing shared architecture, in which additional machines may be added to the cluster. In a shared- nothing architecture, the different nodes in a cluster do not share any resources, and all data is distributed (partitioned) evenly (load balancing) across the cluster by a process called sharding.

Ultra-high Availability

Why is high availability important? Because interactive real-time applications serving several users need to be available all the time. An application cannot be taken offline for maintenance, software, or hardware upgrade or capacity increase. NoSQL databases are designed to minimize downtime, though different NoSQL databases provide different levels of support for online maintenance and upgrades.

Commodity Hardware

NoSQL databases are designed to be installed on commodity hardware, instead of high-end hardware. Commodity hardware is easier to scale-out:, simply add another machine and the new machine added does not even have to be of similar specification and configuration as the machine/s in the NoSQL database cluster.

Flexible Schema or No Schema

While the relational databases store data in the fixed tabular format for which the schema must be defined before adding data, the NoSQL databases do not require a schema to be defined or provide a flexible dynamic schema. Some NoSQL databases such as Oracle NoSQL database and Apache Cassandra have a provision for a flexible schema definition, still others such as Couchbase are schema-less in that the schema is not defined at all. The support for flexible schemas or no schemas makes NoSQL databases suitable for structured, semi-structured, and unstructured-structured data. In an agile development setting the schema definition for data stored in a database may need to change, which makes NoSQL databases suitable for such an environment. Dissimilar data may be stored together.

Flexible schemas make development faster, code integration uninterrupted by modifications to the schema, and database administration almost redundant.

Big Data

NoSQL databases are designed for big data. Big data is in the order of tens or even hundreds of PetaByte (PB). Big data is usually associated with a large number of users and a large number of transactions.

Object-Oriented Programming

The data models provided by NoSQL databases support object-oriented programming, which is both easy to use and flexible. Most NoSQL databases are supported by APIs in object-oriented programming languages such as Java, PHP, and Ruby. All client APIs support simple put and get operations to add and get data.

Performance

Why is performance important?   Because interactive real-time applications require low latency for read and write operations for all types and sizes of workloads. Applications need to serve millions of users concurrently at different workloads. The shared- nothing architecture of NoSQL databases provides low latency, high availability,   reduced susceptibility to failure of critical sections, and reduced bandwidth requirement. The performance in a NoSQL database cluster does not degrade with the addition of new nodes.

Failure Handling

NoSQL databases typically handle server failure automatically to failover to another server. Why is auto-failover important?   Because if one of the nodes in a cluster were to fail and if the node was handling a workload, the application would fail and become unavailable. NoSQL databases typically consist of a cluster of servers and are designed with the failure of some nodes as expected and unavoidable. With a large number of nodes in a cluster the database does not have a single point of failure, and failure of a single node is handled transparently with the load of the failed server being transferred to another server.

Less Administration

NoSQL databases are easier to install and administer without the need for specialized DBAs. A developer is able to handle the administration of a NoSQL database, but a specialized NoSQL DBA should still   be used. Schemas are flexible and do not need to be modified periodically. Failure detection and failover is automatic without requiring user intervention. Data distribution in the cluster is automatic using auto-sharding. Data replication to the nodes in a cluster is also automatic. When a new server node is added to a cluster, data gets distributed and replicated to a new node as required automatically.

Cloud Enabled

Cloud computing has made unprecedented capacity and flexibility in choice of infrastructure available. Cloud service providers such as Amazon Web Services (AWS) provide fully -managed NoSQL database services and also the option to develop custom NoSQL database services.

Advantages of RDBMS

While much has been discussed about their merits, NoSQL databases are not without drawbacks. Some of the aspects in which relational databases have advantages are as follows.

Transactional Properties

NoSQL databases do not provide the Atomicity, Consistency, Isolation, and Durability (ACID) properties in transactions that relational databases do.

    • Atomicity ensures that either all task/s within a transaction are performed or none are performed.
    • Consistency ensures that the database is always in a consistent state without any partially -completed transactions.
    • Isolation implies that transactions are isolated and do not have access to the data of other transactions till until the transactions have completed. Isolation provides consistency and performance.
    • A transaction is durable when it has completed.

NoSQL database provide Basically Available, Soft state, and Eventually consistent (BASE) transactional properties.

    • Basically Available implies that a NoSQL database returns a response to every request though the response could be a failure to provide the requested data, or the requested data could be returned in an inconsistent state.
    • Soft state implies that the state of the system could be in transition during which time the state is not consistent.
    • Eventually consistent implies that when the database stops receiving input, eventually the state of a NoSQL database becomes consistent when the data has replicated to the different nodes in the cluster as required. But, while a NoSQL database is receiving input, the database does not wait for its state to become consistent before receiving more data.

Stable and Reliable

The NoSQL databases are still new to the field of databases and not as functionally stable and reliable as the established relational databases.

Established Vendor Support

Most NoSQL databases such as MongoDB and Apache Cassandra are open source projects and lack the official support provided by established databases such as Oracle database or IBM DB2 database.

Conclusion

NoSQL/NonSQL databases such as MongoDB (or Apache Cassandra or Couchbase)  shall never completely replace relational databases such as Oracle database because the NoSQL databases are designed for a different use case, which is big, unstructured data, for example the web scale data used by search engines. Small scale enterprises (and even some larger ones) would continue to use relational databases for their superior transactional properties, stability & reliability, and established support base.

Abu Fazal Abbas

Abu Fazal Abbas


In the past few years, NoSQL databases have been emerged as a suitable alternative for a large number of traditional databases (like Oracle, MSSQL) implementations and many organizations have already started to adopt NoSQL databases. Considering the market trend, would it be right to say that one day NoSQL databases will completely replace the relational databases like Oracle?

Well, in today’s discussion we will try to debate this question.

Until the NoSQL disruption, relational databases like Oracle or MSSQL were the only option for any database needs and they have served the purpose very well. However, with the invent of Big Data (Volume, Velocity, Variety) there arise a need for more scalable, flexible, high performance and highly available databases.

Traditional relational databases like Oracle are suitable for structured data and are best known for their ACID compliance and transactional features beside the unique functionalities that each of these provides. However, they don’t perform well at scale as well as they have predefined schema layout which restricts them from processing unstructured data with ease and considered as a bottleneck for agile developments. Further, there is no way to guarantee 100% availability in a relational database. In today’s world of IOT (Internet of Things), businesses demand 100% availability which means any downtime is not tolerable, be it for maintenance or for system upgrades.

Here the NoSQL database comes into picture. NoSQL (Not Only SQL) database is fundamentally based on distributed architecture, where a set of machines forms the NoSQL database and the data is distributed/replicated among the machines to achieve scaling performance and availability. Further, NoSQL databases provide flexible schema structures where the structures can be defined at read time instead of write time, which helps in agile development of applications. NoSQL databases are known for their high performance at scale (due to data locality and absence of joins), where scaling is achieved by adding more machines (known as horizontal scaling) rather than beefing up the existing machines with more hardware resources like CPU, RAM (known as vertical scaling) in the case of relational databases.

However, like the relational databases, NoSQL databases do have their own advantages and disadvantages. While, NoSQL databases scale well with high performance and provide high availability, they lack in complying with ACID properties or providing transactional features unlike the relational databases. Further, the query languages used to access NoSQL databases differ from one database to another whereas in relational databases we have common SQL query standards. Additionally, NoSQL database languages are not as rich as the SQL language, where we can integrate application logic within the database using PL/SQL. In case of a NoSQL database, there is no scope of integrating businesses logic within the database and the developers must implement all the business logic in the application layer.

Another important feature of relational database is that the schema layout is designed based on data normalization, where as in NoSQL world, data is de-normalized to meet query needs due to the lack of join mechanisms. This de-normalization of data leads to data duplication and in turn results into more space consumptions compared to relational database. Since there is no join mechanisms available in NoSQL world one must model the physical layout based on the business (query) needs as inappropriate data modeling may lead to the need of system redesign and in turn resulting into project failure.

Relational databases can enforce relations between data by normalization and implementing integrity constraints, whereas in a NoSQL database there is no way to define relations between data and that is a biggest bottleneck in implementing data integrity within the database.

There is no way that we can say NoSQL will eventually replace Oracle or for that matter relational databases as the data store. Both of these technologies have their own pros and cons and they both cater to different database needs. NoSQL databases were invented not to replace relational databases rather to answer the problems which couldn’t be addressed using relational databases like storing/processing unstructured data with scaling performance along with guaranteeing 100% availability.

NoSQL databases are designed around the CAP theorem and there are different NoSQL databases available which adheres to different set of CAP properties. Further, there is different types NoSQL storage available like “Key-Value store”, “Document Store”, “Column Store” and “Graph Store”, where each one has a different use case. We can’t universally accept a single NoSQL database for all purpose.

Just like different NoSQL databases have different use cases, Oracle too has its own use cases and it is here to stay. The strongest point for choosing Oracle would be the ACID compliance beside the other amazing features/functionalities that it provides. With Oracle most of the functionalities needed by an application (like Data Redaction, Change history tracking (aka FBDA), Information Life Cycle Management (ILM) and so on) are already incorporated within the database and the application developers need not write additional codes to implement those functionalities.

Relational databases like Oracle are very mature (be it the SQL engine or functionalities) in nature and evolved over last few decades, whereas NoSQL databases are still in the amateur phase and have to cover a long path before it can be considered as a direct alternative of relational databases. We can treat a relational database in NoSQL way by laying out the data in patterns to avoid joins, however there is no way that we can treat a NoSQL database in relational way due to lack of functionalities.

In a nutshell, we can say that NoSQL databases can be considered for use when there is huge volume (probably 100s of terabytes or petabytes) of data to be processed (read or write) at high performance with 100% availability. However, we have to trade off with the relational features (like data integrity, ACID properties, transactional feature, etcetera) to achieve that scaling performance.

Considering the limitations around the NoSQL world, it would be safe to say that relational databases are here to stay (at least for few more decades) until NoSQL databases become mature enough to provide the fundamental functionalities of relational databases.

Fernando Garcia

Fernando Martin Garcia

The term NoSQL does not refer exclusively to a query language, but rather to new ways of modeling data that differ from the relational data model. While Oracle relational database structures are based on tables and rows tied to a predetermined scheme, MongoDB uses JSON collections and documents that do not respond to a fixed schema.

Since the advent of the relational model described by Edgar F. Codd in 1969 until today, the relational model was gaining territory in the world of databases. Finally, the relational data model became the almost unquestionable “de facto” data model for about any kind of project. Facing a challenge for new software development, IT people could propose different technologies and architectures for implementation; but hardly anyone would question whether or not to use a relational database. Today, with the advent of Big Data and NoSQL, the relational databases are no longer the only alternative. And with each new software development challenge, questions like the following ones arise: What kind of database should we use in this situation? Will it work better on an Oracle relational database with tables and rows? Is it better to store data in JSON documents and MongoDB collections? What other alternative do we have?

As we cans see, the world of databases begins to be framed within a new paradigm consisting on an ecosystem of data models in which the relational and the NoSQL models can perfectly coexist.

Today we do not have a single choice but rather a range of possibilities. Which one of all the possible alternatives is the best?

Before approaching an answer, we must introduce the following guarantees that any distributed system should provide (in this context a distributed system is a collection of interconnected nodes that share data):

    • Consistency - A read is guaranteed to return the most recent write for a given client.
    • Availability - A non-failing node will return a reasonable response within a reasonable amount of time (no error or timeout).
    • Partition Tolerance - The system will continue to function when network partitions occur.

The CAP Theorem, presented by computer scientist Eric Brewer of the University of Berkeley in 2000, states that in a distributed system, you can only have two out of the three previous guarantees across a write-read pair.

In other words, it is impossible for a distributed system to simultaneously provide all the three guarantees. Based on our priorities for consistency, availability and partition tolerance, the CAP theorem could be a valuable tool to take into consideration when evaluating which kind of database to use.

Finally, there is always the possibility of hybrid scenarios. MongoDB has published use cases of this nature: existing MySQL-based applications with new features added using MongoDB.


Viewing all articles
Browse latest Browse all 318

Trending Articles