Data Consolidation Phase 2: Architect the Target Consolidated Environment

With the increased use of high-bandwidth, fiber-optic networks and cloud computing, more and more companies are looking to consolidate their data center operations. In his book Administering Data Centers Servers, Storage, and Voice Over IP, Kailash Jayaswal provides insight regarding the best practices to consolidate data center operations. Phase 2 of data center consolidation consists of designing and architecting the consolidated system, which is discussed in this section:

In this phase, a consolidated design is architected using data from the evaluation phase.

What are the features of a good design? First of all, it must contain solutions to most of the key problems identified earlier. The design must specify overall organization of servers, storage devices, and network equipment, as well as the relationships between them. This stage does not specify specific hardware models, server resources, equipment details, or software versions or patches. These will be specified in the implementation phase.

There are several steps in design:

  1. Analyze the collected data.
  2. List architectural requirements.
  3. Create an initial architecture.
  4. Size all equipment.
  5. Create a prototype of the proposed solution.
  6. Test the prototype.
  7. Revise the proposed architecture.
  8. Document the proposed architecture.

Design Step 1: Analyze the Collected Data

A good way to analyze the data is first to classify it into various broad categories, such as

  • Availability data
  • Maintenance-related data
  • Data center and equipment support processes
  • Staff skill set information
  • Performance data for servers, storage, or network equipment

Data gathering (in Phase 1) is different from analyzing.

Design Step 2: List Architectural Requirements

The information must now be used to create architectural requirements. [The Table Below] shows the various architectural necessities that must be included in the proposed solution and is derived from data gathered in Phase 1.

Important Data and Required Architectural Configurations
Data Architectural Requirement
Equipment failure does not cause outages of Web services New consolidated design must have redundant Web servers with two front-end load-balancing devices.
Database server contains 5 TB of customer data and 6 TB of unused storage space. It is expected to grow at 20 percent per year. The database services must have 24 × 7 and high service uptime. You need two high-end clustered servers for database with storage of about 30 TB.
Database must be backed at the end of each day. There must be a full database backup once a backups week for off-site tape storage. The backup server must be capable of high throughput. Schedule online, hot backups every weekday and a cold, offline backups every Saturday night.
Because of financial constraints, NAS storage must be used for database server. However, the performance must be high. Set up an isolated network that is dedicated to NAS storage devices and servers that use the NAS.

A central problem in any consolidated environment is keeping traffic and applications from negatively impeding others that must share the same environment. Another problem is that a single failure would bring down many services. A good architecture must therefore identify and have enough resiliency to mitigate such issues.

Problems with Consolidated [Data Center] Environment[s] and Remedies
Problems Caused By Consolidating Applications and Resources Remedies
Failure of the central hardware would cause outage for many applications. Several critical applications rely on the central hardware, which must have redundant hardware. Fault tolerance is increased by having redundant components such as power supplies, fans, mirrored cache, and RAID storage. Most high-end servers, storage subsystems, and network equipment already have built-in redundancy.
Network communication for different applications contend for bandwidth on the same network pipe. Provide a dedicated network for certain traffic. Examples are a dedicated network for backup, NAS access, and production. For certain lightweight traffic that can be grouped together for easier management, build a network with high bandwidth and fault tolerance.
Server failure causes all applications to fail Keep applications on separate servers. If you have a high-end server (for example, SunFire 15K, HP Superdome, and IBM p690), partition the server into independent domains, each running an independent OS instance. Cluster two or more servers that must run critical applications.
Management of security configurations for different applications becomes a nightmare in a single consolidated network. Use virtual local area network (VLAN) configurations to separate different network interfaces within one or more servers.
Applications within a server are subjected to unwanted resource contention. Use OS tools (such as the nice command on UNIX) or commercial scheduling software (such as BMC Software's Control-M or CA's Maestro) to intelligently schedule and allocate resources to interactive and batch jobs. Grid computing uses server profiles, user requests, and preconfigured policies to schedule workload to different servers. This is ideal for compute-intensive and number crunching applications.
Since more network ports and users need access to applications in the centralized subnet or VLAN, several servers are impacted by a single security breach. Security must be tightened at the OS and application level via strict user access controls. This is done by using sudo and ssh on UNIX, activity logging, encrypting all network traffic, and providing the minimum file access for users.

Design Step 3: Create an Initial Architecture

During this step, a solution is created. The main components are server, storage, and network architecture. General guidelines, policies, detailed technical operating procedures, training needs, and staff consolidation must be identified and documented. It is important that the initial design address all requirements identified in Design Step 2. A high degree of fault tolerance is necessary for centralized areas that support multiple services.

  • Server architecture — Each application is assigned a server with enough direct-attached, SAN-, or NAS-based storage. At this stage, it is enough to say that the server will be a high-end, medium, or low-end server. CPU, RAM, and network interfaces will be detailed in Phase 3. It is important to specify if the servers will be standalones, clustered, or a load-balanced farm.
  • Storage architecture — All application data must be placed on a centralized SAN- or NAS-based storage. Local and direct-attached storage devices must be used only for OS, application binaries, and swap space. The overwhelming benefits of using centralized SAN- or NAS-based storage have been detailed earlier in this chapter.
  • Network architecture — Set up dedicated networks for high-traffic communications such as backup, production, and NAS storage. Some networks where traffic from different servers must co-exist (such as administrative network) must have adequate bandwidth.

Design Step 4: Size All Equipment

Once you have listed all the requirements for the consolidated environment, you are now ready to size the servers, storage, network equipment, and other devices. The proposed capacity must be enough to meet the minimum acceptable performance levels during peak load periods. It is therefore important to use these two data (expected performance and peak loads) actively at this stage.

Design Step 5: Create a Prototype of the Proposed Solution

A prototype is a scaled-down version of the actual proposed design. It is created to discover potential problems in the proposed design, especially compatibility issues between applications when they are centralized to a few servers and networks. It is helpful to keep the prototype environment until the implementation (Phase 3) is complete.

The servers and storage used to build the prototype environment need not be the same models with the number of CPUs and memory that will be used in the production environment. If you have an IBM p690 server with six domains, you can use an IBM p650 server and build six domains. The size of the storage, database, and workload need not be the same as production. However, it is important that the versions of the operating systems, application binaries, and software fixes are the same as the production environment.

In short, the prototype environment used for testing has scaled-down hardware but has software, network ACLs, and VLAN set up like the production environment.

Design Step 6: Test the Prototype

Once a test prototype is built, it must be tested. The goal is to discover fallacies that may be very expensive before going to the next phase. Testing must cover the following vital areas:

  • Setup — Test whether all servers, storage devices, applications, device drivers, and network settings work well together. It is important that the applications coexist in harmony.
  • Security — The wide range of open ports and users in the same environment should not reduce the security.
  • Functionality — Application, backup, monitoring, and access from and to various networks (backup, administrative, storage) must work. Test that the results from the applications are correct.

Most of the testing must be automated via commercial packages or scripts. These tests will be necessary to verify setup during implementation.

Design Step 7: Revise the Proposed Architecture

At this step, you tune the proposed solution based on the test outcome. It is expected that some tests will fail and some will pass marginally. A failed test may point to serious problems in the proposal and may warrant a complete design redo.

Beware of marginal failures. Some may point to deep, systemic problems, while others can be ignored as having been caused by the use of scaled-down hardware. It is difficult to gauge how much you must change due to a marginal failure. Even if you make design changes, you must be aware of the failure modes during the implementation phase(Phase 3)) and be on the lookout for their re-occurrence.

Design Step 8: Document the Proposed Architecture

A clear documentation of the following areas is necessary (and must include these areas):

  • Important assessments from the gathered data
  • Architectural requirements
  • Proposed architecture
  • Prototype layout and its hardware, network, and software components
  • Test results
  • Design modifications prompted by the test results

The documentation will assist others within the organization to re-architect and consolidate their IT environment.

Continue to Phase 3: Implement the new architecture.

 

From Administering Data Centers Servers, Storage, and Voice Over IP by Kailash Jayaswal. Copyright 2006 John Wiley & Sons, Inc. All Rights Reserved. Used by arrangement with John Wiley & Sons, Inc.