OpHangLimitSecs is a parameter in the Stornext volume configuration. Default value is 180. What is it? It is the Operation Hang Limit. When a single metadata operation is hung, and the MDC can’t proceed past said operation for the specified value of time, stornext will panic the volume and fail it over in an attempt to complete the operation or flush it.
It will appear in the logs like this:
StorNext FSS 'Volume1': PANIC: Calling stop HA monitoring func.StorNext FSS 'Volume1': PANIC: /usr/cvfs/bin/fsm "OpHangLimitSecs exceeded
The value is a bit of a misnomer, because its actually measured in half-second “ticks.” SO the default value is 180 ticks, or 90 seconds.
So if you’re seeing this error, what should you look for? It is always something to do with metadata. First, double check that hardware is solid. Check all the connections to the metadata storage and look for errors in the fabric. I’ve seen fibrechannel SFPs that were going bad cause the metadata LUNs to drop from the controllers, causing an ophanglimit. SFPs kinked fiber, bad port settings are all things to check for.
Then move on to the software. More often than not, I see scripts causing OpHangLimitSec panics. Imagine a script that takes 8 minutes to execute, but runs every 3 minutes. You end up getting multiple instances of this script piling up and waiting for the same operation to complete. If any one of those instances waits more than the specified time, boom you have a failover.
So check metadata paths, check recurrent operations, check metadata networks. This is always a metadata problem.