Root
Cause
·
Storage arrays in a SAN are generally set
up in a redundant fashion such that hosts can access logical units (LUNs) over
one of many different paths. Typically these arrays operate in one of two
different modes: active/active or active/passive. With an active/active
array, I/O can be sent down any one of the paths to a LUN and it will be
processed by that controller. With active/passive arrays, one controller is
considered the primary for each LUN, while the other controller is a
backup. Some of these arrays will accept I/O for a LUN over the backup
controller, but it will not be optimized (i.e. worse performance).
However other active/passive arrays will not accept any I/O on the backup
controller for a LUN, and thus any commands sent to it will result in an I/O
error.
·
In RHEL, there are a number of different
commands and utilities that can send I/O to different devices, such as LVM,
udev, fdisk, etc, not to mention applications such as databases, web servers,
etc. If any of these were to issue I/O to a passive path on an array that
does not accept it, it would cause an I/O error in the logs. The
messages are harmless and do not indicate a problem, but
they may fill up the logs or causes unwarranted concern. As a result,
some may wish to try to avoid these errors by preventing applications from
accessing the passive paths. Typically, filtering devices out from LVM
will cause the majority of these errors to go away. Likewise, avoiding
commands like 'fdisk -l' that scan all devices can reduce their
frequency. Finally, configuring any user applications that scan or access
multiple devices to only access the appropriate active path or the logical
multipath device (/dev/mapper/mpath*, /dev/emcpower*, /dev/sddlma*, etc) can
cut down on the errors as well.
Resolution
Note: The following applies
only to I/O errors caused by accessing passive paths. See the Root Cause
and Diagnostic Steps for more information on determining whether this applies
to your environment.
·
One way to cut down on the number of
spurious I/O errors in the system logs is to avoid
scanning passive paths with LVM commands.
This can be done with a filter in /etc/lvm/lvm.conf that only scans devices from device-mapper-multipath, EMC PowerPath,
Hitachi HDLM, or another multipath solution, and avoids the
underlying SCSI device nodes.
·
I/O errors may be caused by any utility or
program that accesses passive storage paths, so it may be necessary to
configure or run them in such a way that avoids these devices. For
instance, rather than using 'fdisk -l', specify an individual device such as
'fdisk -l /dev/mapper/mpatha.
·
Some storage arrays, such as the EMC Clariion, offer an option to enable a type of active/active mode known as
ALUA. With ALUA, path groups are established with different
priorities. Multipath software such as device-mapper-multipath will
recognize these path groups and send I/O to the higher priority paths, but if
I/O does end up going down a passive path it may not generate an I/O
error. If your array supports such a mode, enabling it may prevent these
I/O errors. This different access method generally requires a
configuration change in the multipath software as well.
Note: I/O errors caused by unintentional access to
passive paths are not harmful and should not cause any issues on a
system. They can be safely ignored.
沒有留言:
張貼留言