Wednesday 25 January 2012

Oracle RAC node killed by CRS

I had faced this error message


[    CSSD]2012-01-12 15:24:19.352 [1199618400] >TRACE:   clssnmWaitThread: thrd(2), timeout(1000), wakeonpost(0)
[    CSSD]2012-01-12 15:24:19.353 [1220598112] >ERROR:   ###################################
[    CSSD]2012-01-12 15:24:19.353 [1220598112] >ERROR:   clssscExit: CSSD aborting from thread clssnmRcfgMgrThread
[    CSSD]2012-01-12 15:24:19.353 [1220598112] >ERROR:   ###################################

First of all location of log file is $CRS_HOME/log//cssd/
and file name is ocssd.log

There could be many reason for this error but in a nutshell CSSD has killed the local host connection to rest of RAC cluster. In this case, you will notice a hint on a line above ERROR which says that timeout is happening. Further investigating the log files I noticed that heartbeat between nodes is not fast enough. 
When I checked interface used by interconnect then noticed that it is running on slow speed.

By Changing speed of network interface resolve this problem.
 

Tuesday 3 January 2012

BOOK REVIEW: Oracle 11g R1/R2 RAC Essentials

ORACLE 11g R1/R2 RAC ESSENTIALS
ISBN 978-1-849682-66-4



This is always a need of comprehensive book on this topic. I think a very good effort has been done in writing the above mentioned book. I have bought this book and as I am going through chapters, I'll update this page with my observation.


In First chapter , under the topic of High Availibity: Oracle 11g R1 RAC authors has mentioned that RAC is not a true disaster recovery solution because it does not protect against site failure or database failure.
I think this point need further clarification. In my opinion it depends on your setup, I have recently created a RAC instance using ASM disks with Normal redundancy on two physically separated location. So in this case not only nodes are on different data center but storage is also in two different data center. So ASM has diskgroup has two failgroup of one on each site using stretched SAN. As instance is combination of background processes where is database is combination of data on storage. In the above mentioned setup both database and instance will be available in case of site failure.

Further points still need to come .....................