ORACLE - MANAGING THE CORRUPTION

ORACLE - MANAGING THE CORRUPTION

Oracle database corruption and recovery

Srinivas Maddali

In this document, I discussed various possible corruptions (HW and SW) and recovery and salvage processes. Many MOS docs and Oracle Docs and my experiences put together is this document.

The causes of corruption can be any one of the below reason or multiple reasons.

Bad IO hardware or firmware

OS issues

Oracle Bugs while writing to the files it has formatted (with 0s as 0s have no value unless you have digits on the left or right side)

Recovering NOLOGGING or UNRECOVERABLE (THAT CAN BE RECOVERED IN 12.2.0. (BEFORE 12.2.0 we hit ORA-01578

 

So, broadly the corruption is physical and logical. While logical corruption is recoverable physical is NOT. if you are using local disks with NO mirroring the OS corruption may NOT be recoverable.

Fixing corruption is based on the understanding of :

Determine the extent of corruption and also determining if that corruption is transient or permanent

Replacing the faulty HW and recovering the possible data after running the tools like ckdsk to block the corrupt OS blocks which can break Oracle block as multiple OS blocks can be formatted as Oracle Block. Based on the Oracle block sie, we may NOT be able to read data written to that block.

Determine which logical objects are impacted or affected

Determine the needed recovery method based on the affected data and object/s

If the database is 12.2.0 or higher at RMAN console

recover database not logged block;

This may be in the run block based the backup process defined using MML layer or disk. 

The block/s to be recovered can be identified by querying a metadata table as under:

select * from v$nonlogged_block;

Before doing the not logged block recovery:

If a stand by database (physical) is configured then cancel the recovery.

If it is READONLY stand by database shutdown the database

If it is bidirectional or unidirectional Oracle Goldengate configuration, stop that extract and replcat processes.

After RMAN command return to the CLI command prompt, with no errors, then at SQL*Plus prompt rerun the command

SELECT * FROM V$NONLOGGED_BLOCK ;

It the process is NOT listing any block that is previously listed corrupt, you are good. 

If the database version is below 12.2.0 then you may encounter

ORA-01578

Oracle 8i may list ORA-26040

 

Recovery cannot retrieve the NOLOGGING data.

NO data can be recovered 

 

The solution is based on the basc configuration media recovery is enabled or NOT.

Block-level media recovery

 

If it is header-block the block can NOT be recovered

If that is data block then connect to RMAN conlsole

rman

connect target /

connect catalog catalog_user/catalog_user-password@catalog_database ----- this is optional

at the RMAN prompt 

run { blockrecover datafile (datafile id) block (blck_id) } ------------ the inputs are from can be had from v$backup_corruption; table

alternatively, at RMAN prompt

list failure;

repair failure preview;

repair failure noprompt;

in 11.2 the RMAN recovery process sysntax changes are there.

backup validate check logical database archivelog all;

validate datafile n; or validate datafile n to n;

validate backupset n;

recover corruption list;

recover datafile 1/n block n,n,n,n; datafile n block n,n,n;

and 

recover datafile dn block n to n1 datafile dn block n to nn;

 

Restrictions:

 

Block 1 is always header and can NOT be recovered. RMAN behavior when you try to recover the header block is different. Block 1 (datafile header) is physically corrupt in this example with some garbage.   The RMAN block recover command does not error out but alert log is updated:


 

Options of recovery:

Block media recovery as discussed above

Datafile recovery (if the database is NOT in media recovery mode … then there wi;ll be data loss)

Tablespace recovery (if the database is in no media recovery mode then you will suffer data loss)

Database recovery (if media recovery is NOT enabled then there will be data loss)

After complete recovery –- if it is table

 

analyze .

 

If it is index, the recovery process is simple. Drop the index and recreate the index. Before dropping the index, check if there are any FK constraints referencing. If there are

 

Alter table .table_name> disable constraint

Rebuild the PK 

Disable the constraint

Drop the index

Create Index

Enable PK constraint

Enabling the FK – alter table . enable constraint ;

Indexes also can be rebuilt. Based on the situation you are in, use that.

Salvaging the tables with corrupt data.

Oracle 7.2 onwards we have DBMS_REPAIR.SKIP_CORRUPT_BLOCK. The following script (MOS supplied) may be used:

REM Create the repair table in a given tablespace:

 

BEGIN

  DBMS_REPAIR.ADMIN_TABLES (

  TABLE_NAME => 'REPAIR_TABLE',

  TABLE_TYPE => dbms_repair.repair_table,

  ACTION => dbms_repair.create_action,

  TABLESPACE => '&tablespace_name');

END;

/

 

REM Identify corrupted blocks for schema.object (it also can be done at partition level with parameter PARTITION_NAME):

 

set serveroutput on

DECLARE num_corrupt INT;

BEGIN

  num_corrupt := 0;

  DBMS_REPAIR.CHECK_OBJECT (

  SCHEMA_NAME => '&schema_name',

  OBJECT_NAME => '&object_name',

  REPAIR_TABLE_NAME => 'REPAIR_TABLE',

  corrupt_count => num_corrupt);

  DBMS_OUTPUT.PUT_LINE('number corrupt: ' || TO_CHAR (num_corrupt));

END;

/

 

REM Optionally display any corrupted block identified by check_object:

 

select BLOCK_ID, CORRUPT_TYPE, CORRUPT_DESCRIPTION

from REPAIR_TABLE;

 

REM Mark the identified blocks as corrupted ( Soft Corrupt - reference Note 1496934.1 )

DECLARE num_fix INT;

BEGIN

  num_fix := 0;

  DBMS_REPAIR.FIX_CORRUPT_BLOCKS (

  SCHEMA_NAME => '&schema_name',

  OBJECT_NAME=> '&object_name',

  OBJECT_TYPE => dbms_repair.table_object,

  REPAIR_TABLE_NAME => 'REPAIR_TABLE',

  FIX_COUNT=> num_fix);

  DBMS_OUTPUT.PUT_LINE('num fix: ' || to_char(num_fix));

END;

/

 

REM Allow future DML statements to skip the corrupted blocks:

 

BEGIN

  DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (

  SCHEMA_NAME => '&schema_name',

  OBJECT_NAME => '&object_name',

  OBJECT_TYPE => dbms_repair.table_object,

  FLAGS => dbms_repair.SKIP_FLAG);

END;

/

 

NOTE: the above code is an example. Please use that at your own risk. 

LOB segments are to be handled differently

It is not possible to used DBMS_REPAIR on LOB segments.

If the corrupt LOB block is NOT referenced by any row in the table then it should be possible to CREATE TABLE as SELECT (CTAS) or export / drop / import the table as is.

If the corrupt LOB block is referenced by a row then it should be possible to select or export with a WHERE predicate that excludes the problem row/s.

 

If a standby database is running for the database that has corrupt blocks, then find the options available for recovery immediately when the corruption occurred. This may be difficult. 

 

Use DBVERIFY on the files restored and recovered to rule out corruption is avoided. 

Last Chance is when you have all the archived logs created after the datafile is created, drop the datafile with corrupt blocks. Recreate if

Shutdown the database.

Startup mount;

Alter database datafile ‘’ drop offline;

Alter database create datafile ‘ size m;

Recover datafile ‘…’;

Alter database ‘’ online;

Alter database open;

This activity can be online too. Then the database need NOT be in mount mode. It can be open. The objects and data stored, can throw errors when accessed.