Netapp NFS performance troubleshooting

on July 21, 2012 in Netapp

This command displays the latency details of each NFS call and its histogram. Zero the counter with –z option before you start debugging. You should be in advanced privilege to use this command. We can find out from nfs_hist if any particular NFS operation takes more time to respond. What it does, is measure the response time a filer takes to handle a type of nfs call.

The key concept is if there is a performance problem dealing with NFS on a filer, you can tell if it's a network problem, or a filer problem by checking how long the filer is taking to respond to calls. If we are quick, then the network should be investigated. If our responses are slow, we need to concentrate on the filer, and not the network.
Filer> nfs_hist

v3 getattr: 48342 (blocking requests) - millisecond units

        0        1        2        3        4        5        6        7
     2883    14076     1577     1274     1705     2324     2973     3634
      <16      <24      <32      <40      <48      <56      <64   UNUSED
    15739     1418      267      147       90       93       59        0
     <128     <192     <256     <320     <384     <448     <512   UNUSED
       73        6        2        2        0        0        0        0
    <1024    <1536    <2048    <2560    <3072    <3584    <4096   UNUSED
        0        0        0        0        0        0        0        0
    <8192   <12288   <16384   <20480   <24576   <28672   <32768   UNUSED
        0        0        0        0        0        0        0        0
   <65536   <98304 <131072 <163840 <196608 <229376 <262144 >262144
        0        0        0        0        0        0        0        0


As you can see, there are a lot of calls here, that's because this filer has done something like 48342 ops since its last reboot, or last time nfs_hist -z has been run. nfs_hist -z zero's all the counters, so we can get a point of reference for this type of call. And we can zero the counters at anytime, and watch the number of calls change as a test goes on.

What we see here, is the nfs v3 operation getattr has 48342 operations. That's the first line. The second line shows us 0 - 7. This is a breakdown of the number of calls handled that that many milliseconds. so 2883 were handled in less then 0ms. And 14076 handled in 1ms. All the way to 3634 handled in 7 seconds. After 7 seconds, we group everything that took between 8ms and 16ms and count them up to 15739. And 17ms to 24ms was 1418. My thumb rule, you will not notice degradation in performance until you are above 8192ms.