Resource ‘ora.storage’ Failed to Start With Error ORA-15077
Yesterday I got a call from one customer, to take a look at their system, because there was some problem with the Clusterware, or specifically with the ASM storage.
The system was Oracle RAC 12.1 with two nodes.
It was not so easy to troubleshoot it this time, because the error messages led me to the wrong direction.
I started with checking the ASM Alert Logfile and found these messages:
Fri Apr 17 21:09:14 2020 lmon registered with NM - instance number 2 (internal mem no 1) Fri Apr 17 21:09:17 2020 Using default pga_aggregate_limit of 114037 MB Fri Apr 17 21:11:17 2020 Instance Critical Process (pid: 11, ospid: 176833, LMON) died unexpectedly PMON (ospid: 176812): terminating the instance due to error 481 Fri Apr 17 21:11:17 2020 System state dump requested by (instance=2, osid=176812 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_176827_20200417211117.trc Fri Apr 17 21:11:17 2020 Dumping diagnostic data in directory=[cdmp_20200417211117], requested by (instance=2, osid=176812 (PMON)), summary=[abnormal instance termination]. Fri Apr 17 21:11:18 2020 Instance terminated by PMON, pid = 176812
In the DIAG trace file I couldn’t find any useful error message, so I checked also the ASM Alert Logfile on the first node, which was running okay.
Fri Apr 17 22:28:26 2020 LMON (ospid: 3456) detects hung instances during IMR reconfiguration LMON (ospid: 3456) tries to kill the instance 2 in 37 seconds. Please check instance 2's alert log and LMON trace file for more details. Fri Apr 17 22:29:03 2020 Remote instance kill is issued with system inc 10 Remote instance kill map (size 1) : 2 LMON received an instance eviction notification from instance 1 The instance eviction reason is 0x20000000 The instance eviction map is 2
I opened My Oracle Support and searched for those messages. Of course I was happy to stumble upon this article already after 2 minutes:
ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 (Doc ID 1383737.1)
It seemed, that we are hitting the same issue as described there. As noted there, I checked the HAIP and firewall, checked almost everything, what has to do with the IPs (VIPs, ifconfig, oifcfg etc.), but ASM didn’t want to come up:
[grid/+ASM2@secret02 /]$ crsctl start res ora.asm -init CRS-2672: Attempting to start 'ora.asm' on 'secret02' CRS-5017: The resource action "ora.asm start" encountered the following error: ORA-03113: end-of-file on communication channel Process ID: 0 Session ID: 0 Serial number: 0 . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/secret02/crs/trace/ohasd_oraagent_grid.trc". CRS-2674: Start of 'ora.asm' on 'secret02' failed CRS-2679: Attempting to clean 'ora.asm' on 'secret02' CRS-2681: Clean of 'ora.asm' on 'secret02' succeeded CRS-4000: Command Start failed, or completed with errors.
This error message “ORA-03113: end-of-file on communication channel” led me then to another possible issues. I read a couple of articles on MOS and blogs found via Google, but nothing seemed to match this case.
And then I went to the right direction, in which I should go at the very beginning. I checked the CRS logs and trace files, especially the ohasd_orarootagent_root.trc, wherein I found this error message:
2020-04-17 21:09:12.438828 : USRTHRD:3988051712: {0:0:2} Error [kgfoAl06] in [kgfokge] at kgfo.c:2850
2020-04-17 21:09:12.438833 : USRTHRD:3988051712: {0:0:2} ORA-15077: could not locate ASM instance serving a required diskgroup
2020-04-17 21:09:12.438838 : USRTHRD:3988051712: {0:0:2} Category: 7
2020-04-17 21:09:12.438849 : USRTHRD:3988051712: {0:0:2} DepInfo: 15077
Unfortunately, it seems, we hit the Bug 28148720 – Resource ‘ora.storage’ Failed to Star With Error ORA-15077 (Doc ID 28148720.8), which is solved in 19.1, and there is no interim patch for the 12.1.
Therefore we have to request for it or plan the next upgrade, or simply to reinstall the Clusterware on this node.