RMAN and EMC DD Issue With An Archive Log When Running A Restore on 1 of 3 servers (Doc ID 2116741.1)
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.4 and later Oracle Database Cloud Schema Service - Version N/A and later Oracle Database Exadata Cloud Machine - Version N/A and later Oracle Cloud Infrastructure - Database Service - Version N/A and later Oracle Database Cloud Exadata Service - Version N/A and later Information in this document applies to any platform.
SYMPTOMS
Getting errors from archivelog file recovering database on one server. The same file was restored to other servers/standby hosts and the log restored and applied successfully. There is only a single standby server where the log fails to apply.
CHANGES
CAUSE
EMC's DataDomain backup with RMAN https://www.emc.com/collateral/hardware/data-sheet/h6811-datadomain-ds.pdf DDBoost functionality: Simply put DD Boost is software that enhances how backup servers and clients interact with a Data Domain backup appliance. It is based on Symantec's OST (Open Storage Technology) protocol, and is a means to extend Data Domain features back to the source. Does not reproduce without DDBoost.
SOLUTION
Events can be set to help identify block corruptions when defining the rman channels. What impact does that have on the backups? Will they take longer? Is this something we should always have on - or just to debug issues? - From this point onwards, run all backups with event 10466 turned on for the RMAN channels that are doing the backup. This event turns on additional corruption detection in the Oracle I/O layers beneath RMAN. The purpose of setting this event is to determine whether the corruption is introduced during the backup process or is occurring subsequent to the backup. Here is an example of an RMAN script that can be used to run with event 10466 enabled for only the RMAN backups, but not for any other I/O done by this database: run { allocate channel c1 type sbt; sql channel c1 "alter session set events ''10466 trace name context forever, level 1''"; backup database; } - Do aggressive RMAN validation of every backup that is written to the problematic storage. Ideally do validation of every backup immediately following its creation, and then periodically thereafter.
The completion of the backup indicates that RMAN with event 10466 read each block, validated each and added HARD block bits into each backup similar to cksum in the OS. If corruption is detected in the RMAN/Oracle areas the backup will fail but if corruption is only detected during validation it occurred after the blocks were passed to the media manager or OS layer. When the blocks are handed off by Oracle the receiver returns an ack to acknowledge receipt of the blocks sent. We expect the data sent will be returned so if corruption is detected with the 10466 event set but only during validation the OS vendor or media manager and storage vendors should be engaged.
In this case it was determined to be a hardware failure. The same archivelogs was restored to 3 separate servers and applied to standby databases and only 1 server showed errors consistently. That host was taken out of service.