네트워크 엔지니어들이 문제해결 시에 주로 하는 실수 Top 10 | enterprise.netscout.com

네트워크 엔지니어들이 문제해결 시에 주로 하는 실수 Top 10

네트워크 및 어플리케이션 문제를 해결할 때에 빠지기 쉬운 함정이 있습니다.
This white paper describes 10 of these and how to avoid them.

  • 목차
  • Making assumptions
  • Rebooting
  • Upgrading
  • Validating
  • No baseline
  • Wireless challenges
  • Under-monitoring
  • Understanding core technologies
  • Laptop limits
  • 결론
 

100% uptime? Check.

10Gbps to the desktop? Getting there.

Problem free network and applications? Ha! In our dreams.

Until we reach network perfection (which, let's be honest, may never really happen) engineers will face problems with network systems and the applications they support. Whether the issue is slow performance, poor voice/video quality, dropped connections, or other events that plague networks today, engineers need to continually hone their troubleshooting skills to stay on top of these business efficiency-killers. Additionally, they need to avoid the common pitfalls that all too many network engineers fall into when troubleshooting problems.

Let’s look at a few examples.

1. 문제의 근본 원인을 가정 합니다.

Let's face it: we humans make assumptions based on what we think we know. When a problem strikes, we can't help jumping to conclusions, especially if we have both time and experience in a particular network environment. However, making assumptions can be a huge mistake. They can lead to nonsensical network changes, costly upgrades, and baseless "improvements"- all with our fingers crossed, hoping the problem will go away. This troubleshooting mistake should be avoided at all costs. 대신, 이러한 반사적인 의사 결정을 하기 전에 문제에 대한 사실을 수집합니다. 변경하기 전에 문제에 대한 누가, 왜, 어디서, 무엇을 어떻게를 완전히 이해합니다. Let fact guide every decision made.

 

2. "This fix worked before, let's try it again" Troubleshooting

Similar to mistake number one, this common response to network problems is also based on assumptions. We are all victims of our own experience, so it's easy to rely on our knowledge of what worked last time, thinking that the same will be true again. In many cases, a new problem will show the same symptoms as a previous one, but the root cause could be entirely different.

Before changing anything, make sure to isolate the problem domain to the network, server, application, or client. Clearly identify which component is to blame before trying the guess-and-change approach. Using tools that make use of SNMP, NetFlow, and packet capture, clearly isolate the problem to a layer before moving forward with the resolution.

3. Rebooting the Problem Away

From home routers to 10G switches, almost all electronic devices need to be rebooted at one time or another. It's just a part of how things work today. However, in some IT environments, rebooting a device has become the standard for first-step troubleshooting. This is especially true if a device or server reboot has worked in the past.

If rebooting a device resolves a problem, the fix may only be temporary, requiring another reboot in the near future. Of course, a reboot could be required after a software upgrade, patch, or configuration change. However as a first-response to a network problem, repeatedly rebooting a device will only mask the real root cause. Prior to rebooting a device, collect as much information as possible. For example, is the access point still responding to current users? Is the server accepting new TCP connections? Is the switch CPU at 100% utilization? This information may steer engineers toward the real root cause, rather than the temporary fix.

 

4. 문제 해결을 위한 업그레이드

1Gbps에서 10Gbps로 업그레이드하면 10배로 성능을 향상시키는 것 맞습니까?

No.

Seldom is this the case. All too often, when faced with network problems - especially ones involving slow performance - network engineers are tempted to increase WAN bandwidth, upgrade switches or routers, or implement acceleration technologies. It’s no secret; none of these "fixes" are free. In fact, upgrading as a first-response to a problem can drain the budget, frustrate managers, reduce business productivity, and at worst, cost the job of a network engineer (yikes!).

Before implementing a new technology or upgrading a system/device/connection, there are several important questions to answer: Why are we convinced that this device/technology improvement will resolve the issue? What IS the original issue? Is the problem really rooted in network capacity or latency?

While it’s nice to have new gear on the network, it’s not pleasant to see the look on a manager’s face when an expensive solution fails to address a problem. Upgrading key systems is warranted from time to time, but be careful when upgrading a device as a troubleshooting step.


 

5. Delivering new connections to users without validating them

We've all done it a million times. Unbox and configure a new switch, install it, patch in the uplink, connect the end user drop and watch the light go blinky-blinky.

Done, right?

No. There are several things that can impact the performance experienced by end users once they connect and get to work. Link negotiation, cable problems, interface hardware issues and other throughput killers can impact the connection.

Before officially delivering a link to an end user, it should be tested and validated. This includes measuring latency and throughput for each connection back to the core/data center. As we mentioned, most engineers will connect a link, look for a link light, send a ping and consider the link tested. However all of the issues described earlier would pass this test. Only a full performance test would validate the connection and reveal these problems before the users experience them.

 

6. Failure to create a baseline during normal network performance

When troubleshooting a problem, engineers often utilize monitoring tools to help them collect and interpret information about the network. Even though these tools can display an impressive amount of statistical data, it’s easy to get lost in the details if a “normal” baseline does not exist.

Before a problem strikes, effort should be made to properly baseline the network. This would include collecting traffic utilization and latency statistics on key network links, response time measurements on critical business applications, packet capture samples including typical conversations and protocols, and a complete wireless assessment. These reports will assist network engineers when an issue arises since they will know what "normal" is.


 

7. 무선 도구 및 경험의 부족

Wireless can be a real pain, especially as more end user devices ditch the cable and go 100% Wi-Fi. This trend, as well as the increase in the voice and video applications these devices demand, has greatly elevated the scope and complexity of wireless environments. Even when these systems are implemented and maintained by seasoned RF experts, clients can still experience poor performance, network disconnections, and other frustrating issues.

무선 환경은 성능 문제에 취약하기 때문에, 새로운 이벤트가 발생할 때 종종 첫 책임을 지게 됩니다. Many network engineers point the finger at the Wi-Fi simply because it is an area of the network they don't fully understand or lack the tools to analyze. Rather than have a huge network blind spot, network managers should invest in both tools and training to get engineers up to speed on wireless, equipping them to respond to problems in this domain.

 

8. 네트워크 모니터링 중

Problems that engineers face today are complex, intermittent, and manage to hide in the shadows of the system. 업/다운 핑-기반 도구가 네트워크를 모니터링 하는 데 필요했던 전부였습니다. 이것이 많이 바뀌었습니다.

오늘날의 문제들을 해결하려면 네트워크 및 어플리케이션 모두를 인식하고, SNMP, NetFlow 및 패킷 캡처를 사용하여 가시성을 위해 모든 수단을 다 동원하는 모니터링 시스템이 필요합니다. 이러한 시스템은 24/7/365 어플리케이션을 감시해서 간헐적인 문제가 시스템 모니터링이 다른 방향을 보는 동안 이벤트를 놓치기보다는 문제가 활성화 상태일 때 잡아낼 수 있도록 합니다.

 

9. Misunderstanding the operation of core technologies

What do spanning-tree, ARP, auto-negotiation, ICMP redirects and IP fragmentation have in common?

They are all old (20+ years for each) and absolutely critical for network operation. Well, maybe not IP fragmentation in every case but it was worth the mention. Network engineers need to ensure that they understand the core technologies that their state-of-the-art systems are built on. When prepping for that next vendor certification exam, don't leave out the protocols and technologies that still have a hand in keeping things running this year, and beyond.


 

10. 노트북 하드웨어를 사용하여 패킷 캡처

패킷 캡처 및 추적 파일 해석은 문제를 조사할 때 세부 사항으로 깊게 파고 드는 금본위제입니다. 이 분석 방법은 그냥 네트워크를 면죄하고 벽에 문제를 던져버리는 것이 아니라 문제의 근본 원인을 찾는 데 중요합니다.

When it comes to packet collection, a common mistake made by network engineers today is misunderstanding the limits of the hardware they are using to capture. Wireshark를 예로 들어 봅시다. 이 오픈 소스 도구는 전세계 엔지니어에 의해 알려져 있고 사랑 받으며 가장 많이 다운로드된 네트워킹 도구입니다. 그러나, 대부분의 사람들이 고속 트래픽 스트림을 유지할 수 없는 노트북 또는 테스트되지 않은 하드웨어에서 이 도구를 사용 합니다. 사실, 대부분의 표준 노트북은 100Mbps 보다 더 높은 속도로 원활하게 캡처하기 힘듭니다!

데이터 센터 환경에서 캡처하기 전에 패킷을 수집 하는 데 사용하는 하드웨어의 한계를 알아야 합니다. Missing packets from trace files can easily lead an engineer astray, increasing the time to resolution of a nagging problem.

 

결론

This is not an exhaustive list - there are other pitfalls that engineers of all experience levels fall into from time to time. With a little bit of preparation and awareness of some common mistakes, engineers can reduce time to resolution, trim frustration, reduce costs or unnecessary expense, and avoid the headaches brought on when troubleshooting network problems.

 
 
Powered By OneLink