|
Engineering Release Notice |
| Component: | SAS_FW_Image |
| Release Date: | 01-31-2008 |
| OEM: | LSI |
| Version: | SAS_FW_Image_APP-1.12.121-0395_MPT-01.18.78.00-IT_MPT3-_BB-R.2.3.13_BIOS-MT33_WEBBIOS-1.1-33d-e_11-Rel_PCLI-_2008_01_31 |
| Package: | 7.0.1-0046 |
| FW_SAS | 1.12.121-0395 |
| Component: | FW_SAS |
| Stream: | FW_SAS_Oconee2_Dev |
| Version: | 1.12.121-0395 |
| Baseline From: | FW_SAS_Release_Verde-1.12.111-0390_2008_01_28 |
| Baseline To: | FW_SAS_Release_Verde-1.12.121-0395_2008_01_31 |
| LSID100092834 | (TASK) | FW_SAS Release Version: 1.12.112-0393 |
| LSID100092837 | (TASK) | update maintenance version in version.c |
| LSID100092796 | (TASK) | Data corruption running I/O w/ capacity expansion |
| LSID100092850 | (TASK) | FW_SAS Release Version: 1.12.121-0395 |
| LSID100092828 | (DFCT) | Data corruption occurs running I/O and performing capacity expansion |
| DFCT ID: | LSID100092828 |
| Customer DFCT No: | DF193182 |
| Headline: | Data corruption occurs running I/O and performing capacity expansion |
| Description: | Data corruption occurs running I/O and performing capacity expansion |
| Version of Bug Reported: | 6.0.2-0001 |
| Version of Bug Fixed: | 1.12.112-0393_PL-Ver-1.22.73.0 |
| Steps to Reproduce: | Please see the OEMSpecific_recreation field. |
| Resolution: | Fixed |
| Resolution Description: | Defect Id: LSID100092804 Issue: Data corruption occurs running I/O and performing capacity expansion. Analysis: During a reconstruction, Megaraid firmware facilitates online data access to a reconstructing Logical Drive by internally maintaining two Logical Drives – one represents the portion of capacity that has yet to be constructed ( “original LD”), while the other represents the data area that has completed data reorganization/reconstruction (“ghost LD”). When a host request is received during a reconstruction, Megaraid firmware determines which of these two internal LDs the request falls within and accordingly assigns the request to the either the original LD or the ghost LD. Requests straddling both LDs are deferred and rescheduled until the reconstruction point has advanced beyond the straddling host request. We found that immediately after a reconstruction completes (with Diskerciser is running), Megaraid firmware is processing a certain number of queued host commands that have been assigned to the ghost LD. These commands were received just before the reconstruction completed, were assigned to the ghost LD, then deferred for later execution pending completion of the active reconstruction cycle (a cycle represents the reconstruction of a given set of rows). If the reconstructing cycle happens to be the final cycle (last set of rows), the reconstruction has completed. This triggers the removal of the ghost LD, the reconfiguration of the original LD to match the reconstructed parameters, and the reconfiguration of the entire Megaraid cache to reflect the new, finalized LD configuration. After these steps have completed, Megaraid resumes processing of the host commands that were queued pending the completion of the active (final) reconstruction cycle. The problem occurs when Megaraid processes the queued requests that were assigned to the ghost LD; the ghost LD at that this point no longer exists due to the completion of the reconstruction. These requests are internally allowed to execute on the ghost LD nonetheless; they complete without error because the ghost LD structure in memory is still valid, albeit abandoned. The corruption occurs because the cache buffers utilized in processing these requests will be assigned to the ghost LD, creating potential cache aliases to the original LD; if there is a mix of read and write commands, data in these cache line aliases may become stale relative to data updated on the disk for the write commands. Fix: Upon completion of a reconstruction, we move any commands pending for the ghost LD queue onto the original LD queue. |
| Customer Defect Track No: | DF193182 |
| Customer List: | LSI -- LSI |
| Fix Impact: | Medium |
| Suggested Testing: | Run Diskerciser IO utility with Online OCE (Reconstruction) |
| Child Tasks: | LSID100092796 |
| Task ID: | LSID100092834 |
| Headline: | FW_SAS Release Version: 1.12.112-0393 |
| Description: | FW_SAS Release Version: 1.12.112-0393 |
| State: | Open |
| Change Set Files: | 0 |
| References: |
| Task ID: | LSID100092837 |
| Headline: | update maintenance version in version.c |
| Description: | update maintenance version in version.c from 11 to 12 |
| State: | Open |
| Change Set Files: | 0 |
| References: |
| Task ID: | LSID100092796 |
| Headline: | Data corruption running I/O w/ capacity expansion |
| Description: | Issue: Data corruption occurs running I/O and performing capacity expansion.
Analysis: During a reconstruction, Megaraid firmware facilitates online data access to a reconstructing Logical Drive by internally maintaining two Logical Drives – one represents the portion of capacity that has yet to be constructed ( “original LD”), while the other represents the data area that has completed data reorganization/reconstruction (“ghost LD”). When a host request is received during a reconstruction, Megaraid firmware determines which of these two internal LDs the request falls within and accordingly assigns the request to the either the original LD or the ghost LD. Requests straddling both LDs are deferred and rescheduled until the reconstruction point has advanced beyond the straddling host request. We found that immediately after a reconstruction completes (with Diskerciser is running), Megaraid firmware is processing a certain number of queued host commands that have been assigned to the ghost LD. These commands were received just before the reconstruction completed, were assigned to the ghost LD, then deferred for later execution pending completion of the active reconstruction cycle (a cycle represents the reconstruction of a given set of rows). If the reconstructing cycle happens to be the final cycle (last set of rows), the reconstruction has completed. This triggers the removal of the ghost LD, the reconfiguration of the original LD to match the reconstructed parameters, and the reconfiguration of the entire Megaraid cache to reflect the new, finalized LD configuration. After these steps have completed, Megaraid resumes processing of the host commands that were queued pending the completion of the active (final) reconstruction cycle. The problem occurs when Megaraid processes the queued requests that were assigned to the ghost LD; the ghost LD at that this point no longer exists due to the completion of the reconstruction. These requests are internally allowed to execute on the ghost LD nonetheless; they complete without error because the ghost LD structure in memory is still valid, albeit abandoned. The corruption occurs because the cache buffers utilized in processing these requests will be assigned to the ghost LD, creating potential cache aliases to the original LD; if there is a mix of read and write commands, data in these cache line aliases may become stale relative to data updated on the disk for the write commands. Fix: Upon completion of a reconstruction, we move any commands pending for the ghost LD queue onto the original LD queue. |
| State: | Completed |
| Change Set Files: | 0 |
| References: | LSID100092828(DFCT) |
| Task ID: | LSID100092850 |
| Headline: | FW_SAS Release Version: 1.12.121-0395 |
| Description: | FW_SAS Release Version: 1.12.121-0395 |
| State: | Open |
| Change Set Files: | 0 |
| References: |