LSI_logo Engineering Release Notice
Component: SAS_FW_Image
Release Date: 01-31-2008
OEM: LSI
Version: SAS_FW_Image_APP-1.12.121-0395_MPT-01.18.78.00-IT_MPT3-_BB-R.2.3.13_BIOS-MT33_WEBBIOS-1.1-33d-e_11-Rel_PCLI-_2008_01_31
Package: 7.0.1-0046
FW_SAS 1.12.121-0395


FW_SAS
Component: FW_SAS
Stream: FW_SAS_Oconee2_Dev
Version: 1.12.121-0395
Baseline From: FW_SAS_Release_Verde-1.12.111-0390_2008_01_28
Baseline To: FW_SAS_Release_Verde-1.12.121-0395_2008_01_31
CHANGE SUMMARY:
LSID100092834 (TASK) FW_SAS Release Version: 1.12.112-0393
LSID100092837 (TASK) update maintenance version in version.c
LSID100092796 (TASK) Data corruption running I/O w/ capacity expansion
LSID100092850 (TASK) FW_SAS Release Version: 1.12.121-0395
LSID100092828 (DFCT) Data corruption occurs running I/O and performing capacity expansion
DEFECT RECORDS (Total Defects=1, Number Duplicate=0):
FW_SAS DEFECTS
DFCT ID: LSID100092828
Customer DFCT No: DF193182
Headline: Data corruption occurs running I/O and performing capacity expansion
Description: Data corruption occurs running I/O and performing capacity expansion
Version of Bug Reported: 6.0.2-0001
Version of Bug Fixed: 1.12.112-0393_PL-Ver-1.22.73.0
Steps to Reproduce: Please see the OEMSpecific_recreation field.
Resolution: Fixed
Resolution Description: Defect Id: LSID100092804
Issue: Data corruption occurs running I/O and performing capacity expansion.
Analysis: During a reconstruction, Megaraid firmware facilitates online data access to a reconstructing Logical Drive by internally maintaining two Logical Drives – one represents the portion of capacity that has yet to be constructed ( “original LD”), while the other represents the data area that has completed data reorganization/reconstruction (“ghost LD”). When a host request is received during a reconstruction, Megaraid firmware determines which of these two internal LDs the request falls within and accordingly assigns the request to the either the original LD or the ghost LD. Requests straddling both LDs are deferred and rescheduled until the reconstruction point has advanced beyond the straddling host request.

We found that immediately after a reconstruction completes (with Diskerciser is running), Megaraid firmware is processing a certain number of queued host commands that have been assigned to the ghost LD. These commands were received just before the reconstruction completed, were assigned to the ghost LD, then deferred for later execution pending completion of the active reconstruction cycle (a cycle represents the reconstruction of a given set of rows). If the reconstructing cycle happens to be the final cycle (last set of rows), the reconstruction has completed. This triggers the removal of the ghost LD, the reconfiguration of the original LD to match the reconstructed parameters, and the reconfiguration of the entire Megaraid cache to reflect the new, finalized LD configuration. After these steps have completed, Megaraid resumes processing of the host commands that were queued pending the completion of the active (final) reconstruction cycle. The problem occurs when Megaraid processes the queued requests that were assigned to the ghost LD; the ghost LD at that this point no longer exists due to the completion of the reconstruction. These requests are internally allowed to execute on the ghost LD nonetheless; they complete without error because the ghost LD structure in memory is still valid, albeit abandoned. The corruption occurs because the cache buffers utilized in processing these requests will be assigned to the ghost LD, creating potential cache aliases to the original LD; if there is a mix of read and write commands, data in these cache line aliases may become stale relative to data updated on the disk for the write commands.
Fix: Upon completion of a reconstruction, we move any commands pending for the ghost LD queue onto the original LD queue.
Customer Defect Track No: DF193182
Customer List: LSI -- LSI
Fix Impact: Medium
Suggested Testing: Run Diskerciser IO utility with Online OCE (Reconstruction)
Child Tasks: LSID100092796
UCM ACTIVITY / TASK RECORDS (4):
FW_SAS UCM TASKS
Task ID: LSID100092834
Headline: FW_SAS Release Version: 1.12.112-0393
Description: FW_SAS Release Version: 1.12.112-0393
State: Open
Change Set Files: 0
References:  
FW_SAS UCM TASKS
Task ID: LSID100092837
Headline: update maintenance version in version.c
Description: update maintenance version in version.c from 11 to 12
State: Open
Change Set Files: 0
References:  
FW_SAS UCM TASKS
Task ID: LSID100092796
Headline: Data corruption running I/O w/ capacity expansion
Description: Issue: Data corruption occurs running I/O and performing capacity expansion.

Analysis: During a reconstruction, Megaraid firmware facilitates online data access to a reconstructing Logical Drive by internally maintaining two Logical Drives – one represents the portion of capacity that has yet to be constructed ( “original LD”), while the other represents the data area that has completed data reorganization/reconstruction (“ghost LD”). When a host request is received during a reconstruction, Megaraid firmware determines which of these two internal LDs the request falls within and accordingly assigns the request to the either the original LD or the ghost LD. Requests straddling both LDs are deferred and rescheduled until the reconstruction point has advanced beyond the straddling host request.

We found that immediately after a reconstruction completes (with Diskerciser is running), Megaraid firmware is processing a certain number of queued host commands that have been assigned to the ghost LD. These commands were received just before the reconstruction completed, were assigned to the ghost LD, then deferred for later execution pending completion of the active reconstruction cycle (a cycle represents the reconstruction of a given set of rows). If the reconstructing cycle happens to be the final cycle (last set of rows), the reconstruction has completed. This triggers the removal of the ghost LD, the reconfiguration of the original LD to match the reconstructed parameters, and the reconfiguration of the entire Megaraid cache to reflect the new, finalized LD configuration. After these steps have completed, Megaraid resumes processing of the host commands that were queued pending the completion of the active (final) reconstruction cycle. The problem occurs when Megaraid processes the queued requests that were assigned to the ghost LD; the ghost LD at that this point no longer exists due to the completion of the reconstruction. These requests are internally allowed to execute on the ghost LD nonetheless; they complete without error because the ghost LD structure in memory is still valid, albeit abandoned. The corruption occurs because the cache buffers utilized in processing these requests will be assigned to the ghost LD, creating potential cache aliases to the original LD; if there is a mix of read and write commands, data in these cache line aliases may become stale relative to data updated on the disk for the write commands.

Fix: Upon completion of a reconstruction, we move any commands pending for the ghost LD queue onto the original LD queue.
State: Completed
Change Set Files: 0
References:   LSID100092828(DFCT)    
FW_SAS UCM TASKS
Task ID: LSID100092850
Headline: FW_SAS Release Version: 1.12.121-0395
Description: FW_SAS Release Version: 1.12.121-0395
State: Open
Change Set Files: 0
References: