2014年3月31日 星期一

AX4 buffer I/O error

AX4在EMC中是最低階的storage,所以他的controller mode 是 active/passive,linux server 開機時載入driver 會對硬體進行測試,測試controller B時,因controller A 活著,B就不會發訊號給linux server,所以會出現buffer I/O error,等multipath  or PowerPath 載入後,我們所分配的lun 是可以正常存取,所以這個錯誤訊息是可以忽略的。


Root Cause
·         Storage arrays in a SAN are generally set up in a redundant fashion such that hosts can access logical units (LUNs) over one of many different paths.  Typically these arrays operate in one of two different modes: active/active or active/passive.  With an active/active array, I/O can be sent down any one of the paths to a LUN and it will be processed by that controller. With active/passive arrays, one controller is considered the primary for each LUN, while the other controller is a backup.  Some of these arrays will accept I/O for a LUN over the backup controller, but it will not be optimized (i.e. worse performance).  However other active/passive arrays will not accept any I/O on the backup controller for a LUN, and thus any commands sent to it will result in an I/O error.
·         In RHEL, there are a number of different commands and utilities that can send I/O to different devices, such as LVM, udev, fdisk, etc, not to mention applications such as databases, web servers, etc.  If any of these were to issue I/O to a passive path on an array that does not accept it, it would cause an I/O error in the logs.  The messages are harmless and do not indicate a problem, but they may fill up the logs or causes unwarranted concern.  As a result, some may wish to try to avoid these errors by preventing applications from accessing the passive paths.  Typically, filtering devices out from LVM will cause the majority of these errors to go away.  Likewise, avoiding commands like 'fdisk -l' that scan all devices can reduce their frequency.  Finally, configuring any user applications that scan or access multiple devices to only access the appropriate active path or the logical multipath device (/dev/mapper/mpath*, /dev/emcpower*, /dev/sddlma*, etc) can cut down on the errors as well.

Resolution

Note: The following applies only to I/O errors caused by accessing passive paths.  See the Root Cause and Diagnostic Steps for more information on determining whether this applies to your environment.
·         One way to cut down on the number of spurious I/O errors in the system logs is to avoid scanning passive paths with LVM commands.  This can be done with a filter in /etc/lvm/lvm.conf that only scans devices from device-mapper-multipath, EMC PowerPath, Hitachi HDLM, or another multipath solution, and avoids the underlying SCSI device nodes. 
·         I/O errors may be caused by any utility or program that accesses passive storage paths, so it may be necessary to configure or run them in such a way that avoids these devices.  For instance, rather than using 'fdisk -l', specify an individual device such as 'fdisk -l /dev/mapper/mpatha.
·         Some storage arrays, such as the EMC Clariion, offer an option to enable a type of active/active mode known as ALUA.  With ALUA, path groups are established with different priorities.  Multipath software such as device-mapper-multipath will recognize these path groups and send I/O to the higher priority paths, but if I/O does end up going down a passive path it may not generate an I/O error.  If your array supports such a mode, enabling it may prevent these I/O errors.  This different access method generally requires a configuration change in the multipath software as well.
Note: I/O errors caused by unintentional access to passive paths are not harmful and should not cause any issues on a system.  They can be safely ignored.

沒有留言: