MAXIMISING ROI FROM DATA CENTER VIRTUALIZATION & CONSOLIDATION
백서
| 백서 |

MAXIMISING ROI FROM DATA
CENTER VIRTUALISATION AND
CONSOLIDATION THROUGH
END USER PERFORMANCE
관리

최종 사용자 성능 관리를 통한 데이터 센터 가상화 및 통합으로 ROI 극대화

데이터 센터 통합 사업 사례는 네트워크로서 영향력을 가지며, 어플리케이션 및 서비스는 더욱 복잡해지고, 사용자들의 모바일 사용은 더욱 늘어나며, 비즈니스는 성능을 저하시키지 않으면서 비용을 줄일 필요가 있습니다. 그러나, 이러한 이점을 실현하기 위해 조직들은 성능 메트릭에 대해 밀접하게 통합된 아키텍처를 관리할 필요가 있습니다. 전통적 성능 관리 도구는 다중 플랫폼을 필요로하며 통합된 환경을 관리하기 위하여 필요한 범위 및 관점과 타이밍을 요구합니다. Additional complexities arise when applications are virtualised, making it difficult to measure performance from the enduser perspective.

통합 데이터 센터의 효과적이고 지속적인 관리는 프로젝트에서 투자 수익을 얻을 수 있는 열쇠입니다. Organisations need a solution which can manage performance from the perspectives of all stakeholders – business units, IT and end users – to ensure the consolidation project delivers the required ROI.

The drivers of data center consolidation

Data centers are increasingly transforming from a traditional, distributed infrastructure to a consolidated, service-oriented structure. The business case is compelling. Network infrastructures, applications and services are becoming more and more complex, while users are increasingly mobile and demanding, expecting network performance that will enhance their productivity in any task and whatever their location.

At the same time the challenging economic climate is driving organisations to reduce capital and operating costs without sacrificing quality. Engineers need to get the most from every switch, router, server and hypervisor. Data center consolidation enables organisations to implement more advanced protocols and management strategies that maximise bandwidth utilisation and performance of both network and applications. It also creates the opportunity to implement application virtualisation – separating applications from physical servers – which offers further benefits.

Other benefits of data center consolidation include:
  • Improved security through reducing the number of sites and assets that have to bemanaged and laying the foundation for more sophisticated risk mitigation strategies
  • improved compliance through promoting automation and encouraging the implementation of a comprehensive auditing capability
  • reduced hardware and software requirements and reduced power consumption, facility and transport requirements, reducing capital and operating costs and the organisation’s carbon footprint
To achieve these objectives, servers are being virtualized and 40 Gigabit links installed to support the consolidation of bandwidth hungry applications, which drive up the cost of network interfaces and put greater demands on the cabling infrastructure.

Planning consolidation to maximise benefits

Despite the potential benefits of data center consolidation, organisations need appropriate planning and evaluation if they are to achieve the potential return on investment (ROI). Changes must be made seamlessly, with minimal downtime to production business applications, and the resulting consolidated data center must deliver increased performance to justify time and capital required to implement the project.

작동 중단에 따른 비용
데이터 센터의 작동 중단에 따른 평균 분당 비용 $5,600
보고된 평균 작동 중단 시간 90분
Average cost of incident $505,000
데이터 센터 전체 정전 비용, 평균 복구 시간은 134분 대략 $680,000
For a partial data center outage, averaging 59 minutes, costing approx $258,000

출처: Ponemon Institute 20111.

To achieve the full benefits of consolidation and avoid expensive downtime, organisations need to follow a clear process:

  • Obtain an in-depth understanding of their existing network, applications and services and benchmarking performance
  • Set metrics for the desired performance of the consolidated data center
  • Plan the transition
  • Implement the transition to the new operating environment with minimum downtime
  • Monitor and manage the updated architecture to ensure it achieves the required metrics

According to a survey from analysts Forrester Research, consolidation projects typically take 18-24 months to complete2. During this time, organisations need to dedicate resources and budget to provide staff with the hardware and software to assess the existing operating environment, plan the migration, bring the new architecture online and manage performance.

Clear benchmarks are vital. Without metrics for pre- and post-consolidation performance, organisations cannot measure the ROI. These need to look at the impact on all stakeholders – business unit owners, IT and operations staff, corporate management and end users. If applications are virtualised, measuring performance from an end user perspective becomes more difficult.

So what are the key areas to benchmark? When Forrester asked 147 US enterprises that had completed or were actively implementing a data center consolidation project for the top five metrics they were using to measure success, 52% cited operational cost, followed closely by total cost of ownership (44%), percentage of budget saved (38%), application performance versus infrastructure cost (35%) and performance per CPU core (34%).

To achieve – and demonstrate the achievement of – the ROI for a consolidation project, organisations need to address three areas: reporting, performance management and personnel. In the rest of this paper we will focus on two of these – reporting and performance management, which are closely interlinked.

Implementation challenges

1. 보고하기
With data center consolidation, resources that were previously distributed across the enterprise are gathered into a common pool. As a result, business units that once managed and maintained their own networks, applications and services have to relinquish control to a central team.

The business units now in effect become internal customers of the consolidated data center. To continue supporting it, they need to be assured that their critical applications are performing at or above the same levels as when they were controlled by the business unit. This means establishing internally facing service level agreements between the data center and the business units. Metrics such as application availability and end user response time for transactions between the desktop and the data center should be compiled, tracked and reported regularly to provide the evidence necessary to keep business unit owners on board.

Why business units care about network performance

  • Is the POS system performance retaining customers or losing them? Forty per cent of customers will abandon a website after one or two bad experiences.
  • In a dealing room in New York, London or Hong Kong, a 1ms latency of network delay can cause a $1 million difference in each transaction.
Service level metrics are also required for usage and billing. Business unit owners will naturally only wish to pay for the resources they actually use, rather than subsidising other business units by paying an evenly divided share of data center costs. Reporting should therefore include usage assessment and the corresponding chargeback for all networks, applications and services consumed by each business unit.

2. 성능 관리
While data centre consolidation and the application virtualisation that often accompanies it may streamline enterprise architecture, they introduce management complexity. As more services are virtualised, it becomes increasingly difficult to provide a single view of application usage from data center to desktop, because a single physical server can power multiple machines. With database servers, application servers, email servers, print servers and file servers all potentially sharing the same piece of hardware, tracking network, application and service performance becomes much more difficult.

Finding the right management tool (or tools) is another challenge. Most legacy performance management tools operate best in a silo as they focus on a specific application, service, geographical or logical slice of the network. This approach may be acceptable in a distributed architecture – although problems can hide between an NMS without comprehensive information and complex packet capture tools – but causes problems in a consolidated data center, where the number of silos will grow with the addition of application virtualisation management tools which have not yet been integrated with the legacy performance management tools.

In this situation, network engineers have to rely on a set of disparate tools, each with its own unique capabilities and user interface. They have to use their collective experience and expertise to manually correlate information in order to identify, isolate and resolve problems.

In the best case scenario, performance management is carried out in a similar manner to the distributed environment, bypassing the opportunity to capitalise on collocated information and personnel. In the worst case, it results in finger pointing between the operations and IT teams and lowers the efficiency of anomaly resolution, causing problems for both end users and management.

To address these issues, and unlock the full potential of consolidation, organisations need to find a better way of managing performance and reporting.

Consolidating performance management

A consolidated performance management solution will provide information on all aspects of the network to all parties. This will assist in effective problem resolution without fingerpointing, as well as providing the data to calculate reporting and management metrics such as SLA performance, usage and billing.

However, performance management is the most difficult step in the consolidation process. Legacy performance management tools were designed for a distributed environment and cannot handle the complexities of a consolidated and virtualised architecture. Tools such as application flow monitoring, transactional views, packet analysis, SNMP polling and stream to disc (S2D) archiving require multiple platforms and thus potentially mitigate the advantages of consolidation.

Businesses need an end-to-end solution with the scalability, breadth and depth to acquire, integrate, present and retain information that truly reflects the performance of the networks, applications and services from the business unit, IT and most importantly end-user perspective. To be effective, it needs to three critical characteristics: scope, perspective and timing.

범위
Traditional performance management tools fall into in two categories. Some take a high level approach and skim the surface in data gathering and assessment. They generate dashboards that can be shared with senior management to track overall performance, but do not give visibility into specific areas or assist with problem-solving. The alternatives take a much narrower, deep-dive approach, focusing on a specific segment of the network and capturing packets, examining individual transactions and delivering detailed, real-time analytics.

Ideally, IT teams need a combination of the two approaches. Flow, transactional and SNMP data enables them to examine the overall experience, while packet analysis and S2D capabilities assist in troubleshooting and compliance. They need both the breadth and depth of analysis, but without the manual effort and time associated with point products.

관점
Legacy performance management tools are limited by both the information they provide and the way they present that information.

Network and application viewpoints help to identify the root cause of a problem and resolve it but are not always sufficient, particularly in a consolidated data center where business unit owners require service level metrics.

For example, when an internal or external customer reports unacceptably slow application response times, the best way to confirm the situation and diagnose the problem is for the network engineer to view the network from the user’s perspective. This only becomes possible if the performance management solution has the breadth and depth of analysis discussed above.

타이밍
In an ideal world, when performance problems arise root case is identified quickly and the situation rapidly resolved. However, this becomes more difficult in the complexity of a consolidated data center, particularly if performance degrades slowly over time or problems are intermittent.

The network engineer needs to gather granular performance information from all data sources across the entire network over an extended time period and present the information from the end user’s perspective. This enables operations and IT staff to carry out realtime analysis and to go back in time to discrete points in order to assess and correlate environments associated with intermittent error reports. It also supports the development of short, medium and long term performance baselines, enabling deviations to be identified and addressed as early as possible.

An end-to-end performance management solution should address all three of these issues. It needs to collect, aggregate, correlate and mediate all data, including flow, SNMP data and information gathered from other devices, with granularity up to one millisecond. This data should be displayed through a single user configurable dashboard. This will enable performance to be measured, issues identified and resolved quickly, and provide the visibility needed to support network optimisation. By implementing an appropriate performance management solution prior to data center consolidation, the IT team can ensure that performance is at a minimum maintained and ideally improved following the consolidation project.

Virtualization adds performance management complexity

The additional layer of abstraction inherent to application virtualisation makes performance management more difficult because there is usually less physical evidence available than in the traditional environment in which servers and applications are tightly coupled.

Migration to a virtual network infrastructure requires network engineers to adopt new configuration and monitoring methodologies, as there are fewer physical switches and routers to support. There is also an on-going debate on whether virtualization makes the system more or less secure. 하나의 시스템이 가상 환경에서 손상을 받는 경우, 이것이 다른 모든 시스템에 액세스할 수 있도록 할까요? 추가로, 하나의 하드웨어 플랫폼으로의 물리적인 트래픽 증가가 배선 인프라에 더 큰 영향을 줄 것입니다. This infrastructure should be tested and fully certified before rolling out virtualised services.

Visibility and security within the virtual environment are huge concerns. Before, during, and after migration, it is critical to use SNMP, Net Flow, and virtual taps to monitor the health, connectivity, and usage of these now-virtual systems. 일부 플랫폼에서는, 하드웨어 리소스를 좀 더 효율적으로 활용하기 위해 서버를 자동적으로 이동하고 마이그레이션 할 수 있습니다. 이같은 이유로 인해, 서버 인벤토리 및 위치를 밀접하게 모니터해야 합니다.

Using a virtual tap or traffic mirroring port, application traffic should be monitored and analysed for server response time and irregular behaviour. 가상화가 아직도 IT 조직 내의 다른 시스템과 비교하여 다소 생소한 것이므로, 문제가 발생할 때 가장 먼저 비난을 받는 일이 종종 있습니다. 이같은 이유로, 24x7 모니터링 도구를 이용하여 물리적인 네트워크나 가상 환경에 대한 문제를 신속하게 고립시킬 수 있어야 합니다.

There are specific challenges in managing the user experience when accessing applications held in a virtualised Citrix environment. Due to the added complexity of this environment, the entire transaction, from the user through multiple tiers of application structure, must be monitored, baselined and managed. Understanding this requires an understanding of how Citrix changes the application architecture.

As a user enters a VDI (virtual desktop infrastructure) session, they engage with the Citrix XenDesktop/XenApp server or servers, which host virtual sessions with configured access to specific services as defined and configured by the administrator. These rights often rely on outside interaction with Active Directory, a separate transaction where the Citrix access gateway (through the advanced access control web server) is now the client in a separate request. From there, the user gains access to a session with their Citrix solution. This session is largely ‘screen scrapes’, or the emulation layer of traffic.

Within the payload, there is additional insight into how the end user is interacting with their virtual desktop. Those interactions generate additional transactions in the subsequent application tiers with the Citrix server, acting as client to a more standard n-tier application interaction within the established service architecture.

As the user transactions are handed off, end-to-end correlation of transactions can be difficult, due to the proxied nature of the application architecture. Information indicating the user and user actions is contained in ICA traffic to the Citrix XenApp servers, but it is nested within the payload. Once Citrix generates sessions to back-end application infrastructures, the only real way to correlate is by time and the applications that were accessed.

This means that when the user calls the Helpdesk saying that they believe the network to be down, they may actually be experiencing delays with an application hosted through Citrix. It can take an engineer up to an hour to find out what is actually happening. The only way to understand what may be impacting the end user experience is to implement performance monitoring of these applications from the perspective of the network. In a consolidated environment, this is the transport at the back end between the end user and the data center.

Solutions such as those from VMware provide tools to monitor the virtualized environmentand servers but not the end user experience and the network. In contrast, NETSCOUT hasdeveloped solutions which measure from the end user into the data centre, enabling usersto understand what is happening on the network from the end user perspective and henceidentify and resolve issues more quickly.

These solutions have the ability to provide visibility into this front tier performance, as aggregated by site with per user comparisons, as well as views into the interactions the user had within their session, and per published application performance metrics. This is then correlated with transactions that are generated by the Citrix environment to the standard n-tier application architectures. It both saves time when troubleshooting issues and helps network engineers to become proactive in managing the performance of this application delivery scenario.

References;
1Understanding the Cost of Data Center Downtime - Emerson Network Power & Ponemon Institute

2Cost analysis and measurement help ensure consolidation success - Forrester Research, January 2009.

 
 
Powered By OneLink