技术文章:浪费带宽的隐患以及系统的故障诊断清单
2004-06-10      
打印自: 安恒公司
地址: HTTP://gentoo.anheng.com.cn/news/article.php?articleid=306
技术文章:浪费带宽的隐患以及系统的故障诊断清单

来自于“Cabling Network Systems”和“Globe and Mail”杂志的对于故障诊断的两种见解:浪费带宽的隐患以及系统的故障诊断清单

《Cabling Network Systems》网络故障诊断:关键在于了解问题的根源。福禄克网络公司加拿大分公司的产品专家 Ron Groulx 认为成功的故障诊断的要基于良好的观察力、正式的培训以及实践经验。但如果具有一个基础的故障诊断清单会相对缩短学习的过程。

《Globe & Mail》福禄克网络公司加拿大分公司产品经理 Brad Masterson 描述了一个用户如何试图通过增加带宽来解决问题,并最终发觉问题的根源,只需在网络的另一部分安装一个简单、便宜的方案。

 

 

《Cabling Network Systems》网络故障诊断:关键在于了解问题的根源。


CNS Magazine, March/April 2004

Network Troubleshooting

While it can be a complex chore, the key lies with understanding the root of a problem.

By Ron Groulx

The key to successful troubleshooting is knowing how the network functions under normal conditions, since it enables a technician to quickly recognize abnormal operation.

Any other approach is little better than a shot in the dark.

While the foundation of good troubleshooting is based on insight, formal training and practical experience, the following information can help shorten the learning curve on isolating and solving network problems.

Before the onsite visit: Technicians can save considerable time and resources by determining ahead of time whether an on-site visit is required. Even with continuous improvement being made in operating system software reliability, "reboot your PC" is still the first step.

Information can also be gathered over the phone with the help of the user. Most users can open a command prompt and report back to the technician the result of an IPCONFIG command.

This tells the technician whether the PC has an appropriate address for the subnet to which it is physically connected.

Have the user attempt to use the network following receipt of a fresh IP address. If the IPCONFIG command reports that the DHCP operation cannot be performed, then the user is probably using a static IP configuration.

If the user has reported a valid IP address, try pinging that address from your desk. If the user's PC responds, then have the user attempt some other network activity, such as opening a Web page or pinging the local router to verify basic connectivity.

Verifying the problem on site: If a site visit is necessary, it is important to question the user about any action or activity that may have affected network performance, including any recent changes (i.e. moving office furniture or installing a new screen saver).

The next step is to repeat the tests the user performed previously over the telephone.

A successful ping to a network server or off-net device immediately confirms that the workstation has Layer 3 connectivity to the network, which means all lower-layer tests are instantly deemed "not needed". If Layer 3 connectivity cannot be validated, then troubleshooting must start at the Physical layer--Layer 1.

Extended troubleshooting: Once the inability to log into the network has been verified, the next step is to determine whether the issue relates to the network or the user's PC. To verify this, the technician must determine whether the cable connecting the client to the network is in place and/or functioning properly.

Solving network problems in a timely, cost-effective manner at this point requires a tool that can quickly verify the status of critical network functionality. Handheld devices exist such as Fluke's Network Multimeter that can be used to find basic connection problems and confirm critical network operational parameters, in order to eliminate the presence of physical-layer issues before escalating the trouble-ticket to a more senior technician.

In a shared Ethernet environment, when too many stations attempt to transmit simultaneously, performance may suffer dramatically due to collisions.

While the existence of collisions is a normal part of half duplex Ethernet operation, when the number of collisions begins to rise due to increasing traffic, the traffic volume will begin to rise at an increasing level because of the re-transmissions required.

The network will display a performance curve that suddenly "falls off a cliff" as the number of frames sent, collisions, and re-transmitted packets spirals upward at a rapidly-increasing rate.

Be reminded, however, that if connected to a single switch port (not shared media) the only traffic seen may be broadcast frames, which can be very intermittent on low traffic networks.

Multiple collisions

A switch may operate in full duplex mode, essentially eliminating the shared Ethernet performance drops caused by multiple collisions.

If a link can be established and utilization is reasonable, the user may then press the button corresponding to the ping test to obtain an IP address from the network's DHCP server.

The failure of either a client's or the troubleshooting tool's automatic DHCP configuration could point to a problem with the DHCP relay system.

The process of obtaining a DHCP address demonstrates the viability of the local cable, the local hub or switch port, and the network infrastructure all the way back to the DHCP server. In one simple operation, therefore, most of the nearby network infrastructure has been validated up through Layer 3.

The simple success of a ping indicates that end-to-end Layer 3 connectivity exists between the two devices. The total roundtrip travel time for the request is easily compared to known values to provide a helpful diagnostic for more detailed analysis, if deeper analysis is required.

It is useful to send a series of pings to give the destination multiple opportunities to respond.

Servers outside the enterprise network may also be used as the target for pinging to verify WAN interconnectivity from the client station and local site to a remote site.

If servers within the firewall respond to ping, but those outside the firewall do not, then the source of the problem may be with routers or other aspects of the network boundary infrastructure.

If pings are successful to both external and internal servers, but the client is not receiving those services, it indicates that the problem lies at a level beyond the physical transport.

Next steps

If these instant tests are unsuccessful or inconclusive, then it is time to look at the network cabling. If the cable tests are successful but the problem continues, then the call should be escalated to a senior level network technician for resolution.

The next step is to trace the cable into the wiring closet and the local hub or switch. This can be simplified by using a tone probe feature for audible tracing, as well as a flash function for locating port links.

If the hub or switch port test is good, then the workstation might be the source of the problem. This can be verified by testing for the presence of link and the speed and duplex settings offered by the NIC.

Remediation procedures at this point can include rebooting and retesting of the link, network and protocol reconfiguration, and address verification.

If all components are in place and properly configured, and the workstation still does not show proper network and application connectivity, it is time to escalate the problem beyond the field technician level.

While troubleshooting can be a complex chore, understanding the root of a problem before escalating it to a more senior level can be instrumental in reducing workload and saving costs.

If a technician can quickly isolate the problem, he or she can then determine next steps and make the decision as to whether it can be resolved at the department or group level. All it really takes is solid groundwork.

Ron Groulx is a product specialist with Fluke Networks Canada. A member of the IEEE, he has been involved in the field of networking since 1997.

 


《Globe & Mail》福禄克网络公司加拿大分公司产品经理 Brad Masterson 描述了一个用户如何试图通过增加带宽来解决问题,并最终发觉问题的根源,只需在网络的另一部分安装一个简单、便宜的方案。
Urban legends: Bandwidth gone wild

By Brad Masterson
Globe and Mail Update

Front Lines is a guest viewpoint section offering perspectives on current issues and events from people working on the front lines of Canada's technology industry. The author is the Canadian product manager for Fluke Networks (www.flukenetworks.ca). He has been involved in the field of networking and network testing since 1995, is a Certified Engineering Technologist registered with OACETT, and is a member of BiCSi. He can be reached at brad.masterson@fluke.com.

 

The world of enterprise networking is full of fiscal horror stories.

For example, a school board spent more than $50,000 on services and $100,000 on new equipment to fix a problem on its network to no avail. In desperation, they called in a network troubleshooter, who applied his network analyzer to find the root of the problem. After all the trials and tribulations, testing indicated it was a software glitch (the update, it turned out, was free).

Another school board spent more than $40,000 on network upgrades without seeing any performance improvements. The problem? A configuration issue that left the organization with only 10 per cent of the bandwidth it was supposed to have.

We'd like to think that networking stories like this are exceptional, but that is far from being the case. It is an all-too-common habit in today's environments to spend too much time and budget on fixes before determining the root of the problem. And the most common "fix" of all is thought to be adding more bandwidth.

The general train of thought is that network slowdowns mean you need more bandwidth. In actual fact however, according to Tony Fortunato of The Technology Firm (www.thetechfirm.com) - seasoned network testing expert - bandwidth is a problem in less than 10 per cent of the cases he has encountered. In the meantime he says, there has been a considerable amount of time and money spent on unnecessary fixes.

"I've been called into situations where companies have spent tens of thousands of dollars on upgrades to systems only to find things are worse," he says. "Typically companies have already spent at least $40,000 in fixes before I'm called in. A day or two of troubleshooting with network test tools before you throw in the dollars will often find it's something quite simple and inexpensive."

Fortunato says the cause of slowdowns can range from the keyboard to the Internet Service Provider, and anything in between. The variations he has dealt with are numerous. He said one insurance company spent $1-million in upgrades only to find file retrieval was slower than on the old system. The problem? Software coding and some minor infrastructure problems. After the fix, they found they were actually able to cut their bandwidth requirements in half and save $200,000 a year.

A financial institution was staring at a possible $500,000 in bandwidth upgrades, only to discover the problem was that the network drive mapping for its 7,000 PCs was improperly configured. It was a simple matter of disconnecting the computer in the lab which had been left on in error.

One government agency was actually running too much bandwidth, which slowed the applications down. It simply had to reduce its bandwidth to optimize the application's performance.

"It was the Lucy and Ethel Syndrome," says Fortunato (his pet term that refers to the famous chocolate factory scene where the assembly line keeps accelerating with disastrous consequences). "The application could only process information at a certain rate."

In another case, an oil and gas company discovered that the slowdowns were at the ISP (Internet Service Provider) site, which meant they were being billed for more bandwidth than they were getting — not to mention the fact they had already spent $50,000 on upgrades to resolve the "problem" before finding the real culprit.

These few of many examples indicate that the amount spent on equipment and/or bandwidth upgrades without discovering the root of a network problem can be staggering. Adding bandwidth is a particularly expensive undertaking. A single 3 Meg link can run you $1,500 a month. Adding a 1 Meg link for a large organization with multiple operations can easily take up to $600,000 a year out of an IT budget. Replacing switches, routers or applications: all of these escalate costs into the tens of thousands of dollars or more. In many of those cases, it is money wasted simply for lack of proper diagnostic tools.

Finding the root of the problem is unquestionably a challenge for operations. Specialists in their respective fields (e.g. switches, routers and application development) tend to focus on their area of expertise. Call in a switch or router expert, and they will diagnose your hardware, suggest upgrades and walk away. If the problem is not the switch, then the cabling professional comes in to do their bit. And so on.

Yet a performance slowdown can be virtually anywhere, from the desktop PC with a hard drive that's too full, to cabling, to patches and connections, to hubs and routers, to an application itself. In some cases, it may be caused by something outside the walls of the enterprise. In others, a single fix may not be enough to cure the problem. Using network analyzers for front-line testing and protocol analyzers for more in-depth analysis will quickly pinpoint the source(s) of the problem and provide the groundwork for taking remedial action.

When the right problem is discovered and action taken, the buck should not stop there. Few go to the trouble of verifying results, but they should. It is extremely important to retest your network thoroughly to make sure that the problem is fixed and there are no other hidden issues.

Even those who believe in troubleshooting networks before calling in the experts tend to overlook another cardinal rule of good network health: testing your network before a problem rears its head. In many cases once the slowdown occurs, the business impact is already being felt in the reduced ability to process transactions, lower productivity and lost revenues.

While some believe that networks are as robust and consistent as telephone systems, this is not the case. The complexities of a networking infrastructure require regular monitoring to ensure peak performance. Not only does this pinpoint potential problems, it is also a good "policing" technique for monitoring your bandwidth usage and quality of service delivery from providers.

Network analyzers can be used to perform a baseline diagnostic (an especially important step to take when implementing a new network). Tests should then be done routinely to detect any change in performance. Usually a quick tweaking of an application or configuration can bring the network back to top speed.

However, it's important to understand the skills of the person performing the test, and that the right tool is being used. A front line cable technician, for example, can perform tests with a simple network analyzer, but in many cases does not have the expertise to work with a protocol analyzer, which is required for much more in-depth analysis and troubleshooting. In many cases, bringing in outside services can be a very cost-effective alternative.

In an ideal world, enterprises would approach their networking investment as we do our cars — or our health. We wouldn't replace an engine if our car performed poorly, or used too much fuel. We would take it to an expert to perform a diagnostic before paying for repairs. Nor would we ask for surgery before running the proper tests.

Why enterprises don't practice the same logic with their networks is a mystery, especially in today's world of fiscal restraint. It is in everyone's best interests to perform proper testing before the big payout. So before contracting for more bandwidth, make sure that's what is really needed. A few simple tests by the right person with the right tools can mean tens or even hundreds of thousands of dollars to bottom line results.
 

责任编辑: admin