OpHangLimitSecs is a parameter in the Stornext volume configuration. Default value is 180. What is it? It is the Operation Hang Limit. When a single metadata operation is hung, and the MDC can’t proceed past said operation for the specified value of time, stornext will panic the volume and fail it over in an attempt to complete the operation or flush it.
It will appear in the logs like this:
StorNext FSS 'Volume1': PANIC: Calling stop HA monitoring func.
StorNext FSS 'Volume1': PANIC: /usr/cvfs/bin/fsm "OpHangLimitSecs exceeded
The value is a bit of a misnomer, because its actually measured in half-second “ticks.” SO the default value is 180 ticks, or 90 seconds.
So if you’re seeing this error, what should you look for? It is always something to do with metadata. First, double check that hardware is solid. Check all the connections to the metadata storage and look for errors in the fabric. I’ve seen fibrechannel SFPs that were going bad cause the metadata LUNs to drop from the controllers, causing an ophanglimit. SFPs kinked fiber, bad port settings are all things to check for.
Then move on to the software. More often than not, I see scripts causing OpHangLimitSec panics. Imagine a script that takes 8 minutes to execute, but runs every 3 minutes. You end up getting multiple instances of this script piling up and waiting for the same operation to complete. If any one of those instances waits more than the specified time, boom you have a failover.
So check metadata paths, check recurrent operations, check metadata networks. This is always a metadata problem.
If you run a heterogeneous shared storage environment (such as a stornext system), you may find this information about ACLs useful.
*NIX systems use POSIX 1e standard ACLs. Windows, Mac, and [some] others use NFSv4 standard ACLs. In a Stornext SAN environment, MacOS and Windows are not able to use the POSIX 1e ACLs that the Linux host writes on the files and directories. Likewise, Linux is unable to see the NFSv4 ACLs that a Windows machine or a Mac writes. This can create some major problems if ACLs are essential to your workflow.
Linux writes POSIX.1e ACLs.
Win & Mac write NFSv4 ACLs.
These two standards are NOT compatible.
So what is the solution? Well to be perfectly frank, there isn’t any easy solution. I’ve used several options with varying degrees of success based on the situation:
- Periodic Permission Repair Script – A not-very-elegant, but still working solution is to run a script from a Mac SAN client. The script should execute periodically (via launchd) to set permissions in a given folder to whatever your desired setup is. Running this from a mac allows you to set the POSIX bits (for Linux access) and the ACLs (for Mac/Win access). Once you set this properly via chmod, the permissions should work as intended. What this means is that you’ll need to have an “cleaning” folder that you drop items into, allow the script to run, then remove them.
GOTCHAS: You’ll want this to run frequently so your users don’t have to wait too long, but if you have a large volume, it could respawn while its previous instance is still running. If left unresolved this could cause a
- Triggered Permission Repair Script - Similar to the option above, but provide a file path to the script, and a user (assuming they have admin access) can execute said script. If you dont want to dole out administrative access to machines, but still want to be able to trigger a repair, consider a web-interface on said Mac client with a PHP command to trigger a local bash script.
- Perimeter Protection - Hard on the outside, soft and chewy in the center. Consider setting a top-level directory to have very restrictive POSIX and ACL permissions, but have everything underneath opened up. You’d need to set your umask on all machines to 000 (along with the windows equivalent), but it will eliminate most permissions problems down the road.
GOTCHAS: not gonna work if you need to have various levels of permission on variously nested directories.
(Currently on Linux there is a compatibility for systems running ZFS with NFSv4 ACLs (and this is beta). Other filesystems (i.e. CVFS) have no samba compatibility with the NFSv4 ACLs)
I changed the permalinks to be pretty but it breaks my site.
I updated the web.config with the right code but its still not working.
These are things that are commonly found when googling around about pretty permalinks.
Many of the message boards and blogs will tell you that wordpress is meant to run on Linux. Well it works great on Linux, arguably more predictable and more supported. But its also supported on mac and windows. If (for whatever reason) you need to run this on a Windows IIS server then setting up pretty permalinks will require messing with the web.config file for the Microsoft URL Rewrite Module. The WordPress Codex has everything you need to do this on Windows (and linux too).
The one thing that kicked my but for a couple of hours, and isn’t well documented, is that the web.config files don’t nest very well. The site I was working on was in a folder 1 level down with its own web.config file. Long story short, I figured out that there was a web.config file at the root level of the web server that was interfering with the URL rewrite. I changed to pretty permalinks, filled out my site’s pretty permalinks properly, and got 404 errors on all the pages. Once I removed the web.config file from the root level, my wordpress site began working properly, and gave me my pretty permalinks.
Stornext is a shared storage environment that offers high speeds and large sizes that scale independently.
Apple licensed StorNext from Quantum and re-branded it as Xsan. So its very common to hear that “Xsan is StorNext.” But StorNext is not Xsan. (more…)
You’ll have to take a look at the currently unmet business needs to help determine what will be the most benefit to the company.
Basic administration ideas
- Monitoring: setup Nagios to monitor ping, disk space, cpu usage, on all network infrastructure
- Monitoring: setup Cacti to monitor disk usage with historical graphs
- Documentation: are all the server configurations and network information stored in an easily accessible, editable format (think wiki)? If not, roll a wiki instance on your server and document everything. Mediawiki is free, and okay. Atlassian Confluence is better, and not much more than free.
Monitoring and documentation are the two most critical administrative tools. These are the two pillars of an environment.
If you grew up as an IT guy on Windows, then you’ve been done a disservice. It’s okay, you can unlearn the evil. Here are some key differences that will help you become acclimated to Linux:
- Troubleshooting: there’s always a log file, for every service and application in linux. Having a problem? Consult the log file. Can’t find the log file? Look in the startup script for the application.
- Troubleshooting: learn strace. strace is the trace of system calls executed by a given program. You can see what files are opened, written to, etc, and this is a critically important tool for troubleshooting issues.
- Troubleshooting: use “bash -x”. Running a bash script with the “-x” option shows you the execution line by line, and is very valuable for troubleshooting as well.
- Troubleshooting (advanced): gdb (gnu debugger). You can run a binary program with gdb and inspect the call stack, see what’s stored in memory, and do other sweet shit.
- Troubleshooting tools: netstat, ps, top, lsof, dig, nslookup, ping, tcpdump. Netstat is your eyes into what program is listening / bound to what port. ps and top let you know what programs are running and what their memory usage is. lsof lists open files, and tells you what programs are accessing them – very useful. dig, nslookup, ping, and tcpdump are all handy network troubleshooting tools.
- Learning: man pages. Always read the man pages when you’re learning a new command, or try to. Sometimes they’re really very indiscernible, but often times they are not, and it’s faster than lazily googling for your quarry.
- Learning: patience. Breathe. You can do it, but you’ll have to take it slow.
- Learn what an inode is. That’s important.
- Next level shit: learn Python. Another way to put this: learn linux, and Python, and get a few years (5-8+) under your belt – you’ll be earning 6 figures as a Unix engineer for some fancy fucking company.
Above all, remember: in Linux there’s almost always a semi-sane answer for what’s happening. You just have to know where to look. Log files, bash -x, strace, etc. The best way to learn is to setup some basic services and get your learn on. Try BIND, NFS, Samba CIFS, Nagios, Cacti, SSH via PKI, and then come back if you’ve done all that.
Oh, and turn off SELinux at first. That’s gonna be a real pain in the ass while you’re learning. Then turn it back on when you’re ready to do some security training.
Take some online (self-paced) redhat courses. These can be a little rote, but they’ve got value. Copy and paste the material into an electronic notebook for later use (http://evernote.com).
Hell if you can do all that, I’ve got a job for you.
I was attempting to create an alias interface (aka sub interface) to have a second IP address on the same physical interface. (more…)
Centos6/RHEL6 don’t have the System > Administration > Login Window menu item. So there is nowhere in the GUI to enable auto-login of a user. If you have a single-user environment and aren’t worried about anyone approaching your console then this is how you enable the auto login.
You need to edit the file at /etc/gdm/custom.conf
You need to edit the daemon section of the file and add or change these:
Then you need to reboot and you’ll be logging in fine.
Credit on this goes to Sudhaker: http://sudhaker.com/23/centos-rhel-6-autologin
Charles Edge has a great how-to for adding an Xsan client to a Stornext SAN here:
http://krypted.com/xsan/adding-xsan-clients-to-stornext-environments/ (link broken)
This will get you setup if you have no clients on the SAN, and frankly its probably the “right” way to do it. But there is a quicker way if you already have some Xsan clients attached and you just want to add more. (more…)
(verb) – Shoot Myself In The Head (more…)
Previously, I’ve posted things on how to administer systems manually. From creating users to enabling ARD and SSH, there are a lot of things that you can do manually. In the past I had to do that on many systems. However at my new company, we use a software by Jamf called Casper.