Tutorial 07 – Data Persistence

1. Discuss the role of data in information systems indicating the need for data persistence

2. Explain the terms: Data, Database, Database Server, and Database Management System
Data : Information in raw or unorganized form
Database : A database is a collection of information that is organized so that it can be easily accessed, managed and updated
Database Server : A database is a collection of information that is organized so that it can be easily accessed, managed and updated
Database Management System : system software for creating and managing databases. The DBMS provides users and programmers with a systematic way to create, retrieve, update and manage data.

3. Compare Files and Databases, discussing pros and cons of them

File System
Pros of the File System

Performance can be better than when you do it in a database. To justify this, if you store large files in DB, then it may slow down the performance because a simple query to retrieve the list of files or filename will also load the file data if you used Select * in your query. In a files ystem, accessing a file is quite simple and light weight.
Saving the files and downloading them in the file system is much simpler than it is in a database since a simple "Save As" function will help you out. Downloading can be done by addressing a URL with the location of the saved file.
Migrating the data is an easy process. You can just copy and paste the folder to your desired destination while ensuring that write permissions are provided to your destination.
It's cost effective in most cases to expand your web server rather than pay for certain databases.
It's easy to migrate it to cloud storage i.e. Amazon S3, CDNs, etc. in the future

Cons of the File System

Loosely packed. There are no ACID (Atomicity, Consistency, Isolation, Durability) operations in relational mapping, which means there is no guarantee. Consider a scenario in which your files are deleted from the location manually or by some hacking dudes. You might not know whether the file exists or not. Painful, right?
Low security. Since your files can be saved in a folder where you should have provided write permissions, it is prone to safety issues and invites trouble, like hacking. It's best to avoid saving in the file system if you cannot afford to compromise in terms of security.

Database
Pros of Database

ACID consistency, which includes a rollback of an update that is complicated when files are stored outside the database.
Files will be in sync with the database and cannot be orphaned, which gives you the upper hand in tracking transactions
Backups automatically include file binaries.
It's more secure than saving in a file system.

Cons of Database

You may have to convert the files to blob in order to store them in the database.
Database backups will be more hefty and heavy
Memory is ineffective. Often, RDBMSs are RAM-driven, so all data has to go to RAM first. Yeah, that’s right. Have you ever thought about what happens when an RDBMS has to find and sort data? RDBMS tracks each data page — even the lowest amount of data read and written — and it has to track if it’s in-memory or if it’s on-disk, if it’s indexed or if it's sorted physically etc

4. Discuss different arrangements of data, giving examples for each
•Un-structured
Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.

Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents.

•Semi-structured
Semi-structured data is data that is neither raw data, nor typed data in a conventional database system. It is structured data, but it is not organized in a rational model, like a table or an object-based graph. A lot of data found on the Web can be described as semi-structured. Data integration especially makes use of semi-structured data.

Examples of semi-structured : CSV but XML and JSON documents are semi structured documents, NoSQL databases are considered as semi structured.

•Structured
structured data. Structured data refers to any data that resides in a fixed field within a record or file. This includes data contained in relational databases and spreadsheets

Examples of structured data include numbers, dates, and groups of words and numbers called strings. Most experts agree that this kind of data accounts for about 20 percent of the data that is out there. Structured data is the data you're probably used to dealing with. It's usually stored in a database.

5. Explain different types of databases, providing examples for their use

Hierarchical Databases
In a hierarchical database management systems (hierarchical DBMSs) model, data is stored in a parent-children relationship nodes. In a hierarchical database, besides actual data, records also contain information about their groups of parent/child relationships

Example : IBM Information Management System (IMS) and the RDM Mobile

Network Databases
Network database management systems (Network DBMSs) use a network structure to create relationship between entities. Network databases are mainly used on a large digital computers. Network databases are hierarchical databases but unlike hierarchical databases where one node can have one parent only, a network node can have relationship with multiple entities. A network database looks more like a cobweb or interconnected network of records

Example : If we have to design a School Database, then Student will be an entity with attributes name, age, address etc. As Address is generally complex, it can be another entity with attributes street name, pincode, city etc, and there will be a relationship between them.

Relational Databases
In relational database management systems (RDBMS), the relationship between data is relational and data is stored in tabular form of columns and rows. Each column if a table represents an attribute and each row in a table represents a record. Each field in a table represents a data value.
Structured Query Language (SQL) is a the language used to query a RDBMS including inserting, updating, deleting, and searching records.

Example : Most well known DBMS applications fall into the RDBMS category. Examples include Oracle Database, MySQL, Microsoft SQL Server, and IBM DB2

Object-Oriented Model
In this Model we have to discuss the functionality of the object oriented Programming. It takes more than storage of programming language objects. Object DBMS's increase the semantics of the C++ and Java.I t provides full-featured database programming capability, while containing native language compatibility. It adds the database functionality to object programming languages. This approach is the analogical of the application and database development into a constant data model and language environment. Applications require less code, use more natural data modeling, and code bases are easier to maintain. Object developers can write complete database applications with a decent amount of additional effort.

Graph Databases
Graph Databases are NoSQL databases and use a graph structure for sematic queries. The data is stored in form of nodes, edges, and properties. In a graph database, a Node represent an entity or instance such as customer, person, or a car. A node is equivalent to a record in a relational database system. An Edge in a graph database represents a relationship that connects nodes. Properties are additional information added to the nodes.

ER Model Databases
An ER model is typically implemented as a database. In a simple relational database implementation, each row of a table represents one instance of an entity type, and each field in a table represents an attribute type. In a relational database a relationship between entities is implemented by storing the primary key of one entity as a pointer or "foreign key" in the table of another entity.

6. Compare and contrast data warehouse with Big data

Data warehouse means the relational database, so storing, fetching data will be similar with normal SQL query. And big data is not following proper database structure, we need to use hive or spark SQL to see the data by using hive specific query. 100% data loaded into data warehousing are using for analytics reports.

7. Explain how the application components communicate with files and databases

8. Differentiate the SQL statements, Prepared statements, and Callable statements

The Statement is used for executing a static SQL statement. Used to execute normal SQL queries.
The PreparedStatement is used for executing a precompiled SQL statement. Used to execute dynamic or parameterized SQL queries.
The CallableStatement is an interface which is used to execute SQL stored procedures, cursors, and Functions. Used to execute the stored procedures.

9. Argue the need for ORM, explaining the development with and without ORM

10. Discuss the POJO, Java Beans, and JPA, indicating their similarities and differences

POJO (Plain Old Java Object): A Plain Old Java Object or POJO is a term initially introduced to designate a simple lightweight Java object, not implementing any javax.ejb interface, as opposed to heavyweight EJB 2.x (especially Entity Beans, Stateless Session Beans are not that bad IMO). Today, the term is used for any simple object with no extra stuff.

JavaBeans: JavaBeans are reusable software components for Java that can be manipulated visually in a builder tool. Practically, they are classes written in the Java programming language conforming to a particular convention. They are used to encapsulate many objects into a single object (the bean), so that they can be passed around as a single bean object instead of as multiple individual objects. A JavaBean is a Java Object that is serializable, has a nullary constructor, and allows access to properties using getter and setter methods.

Enterprise JavaBeans (EJB) is a managed, server software for modular construction of enterprise software, and one of several Java APIs. EJB is a server-side software component that encapsulates the business logic of an application.

11. Identify the ORM tools available for different development platforms (Java, PHP, and .Net)

12. Discuss the need for NoSQL indicating the benefits, also explain different types of NoSQL databases
NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is easy to scale. NoSQL database is used for distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-time web apps.

Example companies like Twitter, Facebook, Google that collect terabytes of user data every single day.

13. Discuss what Hadoop is, explaining the core concepts of it

14. Explain the concept of IR, identifying tools for IR

Retrieval Tools. Systems created for retrieval of information. Retrieval tools are essential as basic building blocks for a system that will organize recorded information that is collected by libraries, archives, museums

Notes

Search This Blog

Tutorial 07 – Data Persistence

Comments

Post a Comment