|
The Oracle Grid
Introduction
Mainframes had been the masters in computer performance in large client/server environments, but it has been replaced by the Oracle Grid - A collection of servers utilizing the latest hardware technologies to incorporate faster and reliable access for clients. If a server fails, the mainframe stops while the Oracle grid keeps on running. Enterprises are looking at ways by which they can increase efficiency and decrease costs of their systems and processes. In the grid, pools of computing resources like data farms are created and dynamically allocated as per requirements in the various departments of your organizations. This is analogous to how electric devices work, you don't know where the source of electricity is, whenever you require you use it and you can consume as much power as you want. This is known as grid computing. In the grid, the components like storage, processors, databases, application servers are integrated by virtualization. Virtualization provides a logical rather than physical view of data, computing power, storage capacity, and other resources. Vendors such as HP, IBM deliver hardware technologies for delivering technologes for virtualization. Ex. HP ProLiant Blade servers with Intel Xeon processors
Hardware technologies used in the grid
- Processors
The expensive SMP architecture has become traditional with the release of low cost, high volume processors like Intel's Itanium 2 and Sun's 64 bit Ultra SPARC and IBM PowerPC 64 bit processors.
- Blade Servers
A Blade Server is a computer system on a motherboard, which includes processor(s), memory, a network connection and storage. The blade idea is to address the needs of large scale computing centres to reduce space requirements for application servers and lower costs. A typical application could be serving web pages. So along with a storage blade they can be rack mounted in multiple racks within a cabinet together with common cabling, redundant power supplies and cooling fans. These have remote management capabilities that make life easier for data center administrators. Ex. HP ProLiant BL (HP Bladesystem)
- Networked Storage
A network-attached storage (NAS) device is a server that is dedicated to file sharing. NAS does not provide any of the activities that a server in a server-centric system typically provides, such as e-mail, authentication or file management. NAS allows more hard disk storage space to be added to a network that already utilizes servers without shutting them down for maintenance and upgrades. With a NAS device, storage is not an integral part of the server. Instead, in this storage-centric design, the server still handles all of the processing of data but a NAS device delivers the data to the user. A NAS device does not need to be located within the server but can exist anywhere in a LAN and can be made up of multiple networked NAS devices. Ex. Veritas ServPoint NAS powered by Sun Enterprise servers and InoStor InteliNAS has a capacity of 4.8TB.
- Storage Area Network [SAN]
The Storage Area Network operates behind the servers to provide a common link between servers and storage, allowing administrators to independently scale the storage or server processing power as requirements demand. It allows multiple servers to access the same data so that duplication of information can be reduced, and permits data backup to take place directly over storage channels, eliminating the bottleneck of the relatively slow LAN. Data is also more consistently available, as the failure of a single server will not cut off any storage from remaining servers.
Ex. The HP StorageWorks SAN family offers :
- Entry-level SAN - Ideal for small to medium deployments and scalable to terabytes of capacity. Eg: HP Modular Smart Array 1000 (MSA1000).
- Mid-range SAN - Targeted towards medium to large size data centres offering multiple terabytes of capacity and high performance. Eg: HP Enterprise Virtual Array 3000 (EVA3000).
- Enterprise class SAN - HP's flagship virtual disk array for enterprise-wide deployment and mission-critical applications. Eg: HP Enterprise Virtual Array 5000 (EVA5000).
- Network Interconnects
- Gigabit Ethernet
Abbreviated GbE, a version of Ethernet, which supports data transfer rates of 1 Gigabit (1,000 megabits) per second. The first Gigabit Ethernet standard (802.3z) was ratified by the IEEE 802.3 Committee in 1998. Gigabit solutions are provided by Cisco and Intel.
- Infiniband Interconnect
InfiniBand is an architecture and specification for data flow between processors and I/O devices that promises greater bandwidth and almost unlimited expandability in tomorrow's computer systems. In the next few years, InfiniBand is expected to gradually replace the existing Peripheral Component Interconnect (PCI) shared-bus approach used in most of today's personal computers and servers. Offering throughput of up to 2.5 gigabytes per second and support for up to 64,000 addressable devices, the architecture also promises increased reliability, better sharing of data between clustered processors, and built-in security. InfiniBand is the result of merging two competing designs, Future I/O, developed by Compaq, IBM, and Hewlett-Packard, with Next Generation I/O, developed by Intel, Microsoft, and Sun Microsystems.
- Node
A system having one or more than one processors using a shared memory. It is a device that is connected as part of a computer network. Nodes can be computers, personal digital assistants (PDAs) or various other network appliances.
- Clusters
It is a group of nodes interconnected by a high speed network that work together as if they were one machine, for higher availability.
- Oracle Real Application Clusters [RAC]
Oracle RAC is a cluster database with a shared cache architecture. It enables the Oracle Database to run real application, for example, business applications like Oracle E-business Suite on clusters. If a node in a cluster fails , Oracle will be able to run with the help of other nodes. Similarly if more processing power is required, then more nodes can be added to the cluster. Oracle RAC enables the grid to work.
Eg:The Parallel Database Cluster for Red Hat Linux RAC is a multi-node shared storage configuration cluster optimized for Oracle9i Real Application Clusters running in the Linux environment.
The database grid consists of Real Application Clusters [RAC]. After realizing the power of Oracle Parallel Server,
RAC was introduced in Oracle 9i and further improvised in Oracle 10g RAC. The idea of the grid is to add or remove nodes
as needed based on the workload. Since cost is the main factor in a grid Open Source technologies are used.
Eg: HP Parallel Database Cluster is a multi-node shared storage cluster architecture.
The new features added into Oracle Database 10g RAC are:
- Has increased the number of nodes one can have in a RAC cluster
- Providing end to end clustering solutions like OCFS (Oracle Cluster File System) and the volume manager known as ASM[Automatic Storage Management].

Before Oracle 9i Enterprise Edition was introduced, the clusterware [clustering software] was provided by third party vendors or by OS vendors.
Now with Oracle Database 10g, Oracle introduced CRS [Cluster Ready Service], a clusterware to cluster together nodes on any supported OS.
Cluster Ready Service[CRS] consists of 3 components:
- ocssd (Cluster Synchronization Services Daemon)
Used to maintain a communication between the processes in a clustered environment.
- crsd (Cluster Ready Services Daemon)
Cluster to maintain availability of resources
- evmd (Event Logger Daemon)

The above run as daemons in UNIX environment and as services in Windows; daemons are programs that runs, without human intervention, to accomplish a given task.
Disabling Cluster Synchronization Services in Linux
Scenario
Database - Oracle 10g Rel 10.1.0.2.0
Platform - Red Hat Enterprise Linux AS release 3 (Taroon Update 2) Kernel 2.4.21-15.EL
After the installation of Oracle 10g Database the following was observed :-
- A Program crash file of size 18.5MB is being created every minute in the '$Oracle_home/css/init' folder.
- Estimation of Media consumption by this flaw = (18.5*60)/1024 = 65GB/day
Filename = core.<pid>
The requirement is to resolve the flaw without shutting down the production database nor restarting the operating system. Here we do not have a RAC environment configured so the unnecessary processes have to be terminated.
Solution:
Giving ps -ef we find that 2 unwanted processes are executing
/bin/...etc/init.d/init.cssd run
/bin/...etc/init.d/init.cssd startcheck
Shutdown the process
# /etc/rc.d/init.d/init.cssd stop
Shutting down CSS daemon
Shutting down EVM daemon
These processes should be running if Oracle DB is using clustering for storage.
To remove the process from system startup
# su - root
Password:
# vi etc/inittab
Comment out h1:35:respawn:/etc/init.d/init.cssd run >/dev/null 2>&1 /dev/null
Notice that the respawn command restarts the cssd process even if it fails.
Acronyms used
- CSS - Cluster Synchronization Services
- CRS - Cluster Ready Service, it is a clusterware provided by Oracle to cluster together nodes on any supported operating systems.
- EVM - Event Manager
|