某业务系统的监听每过10天左右,就异常终止一次

发布时间:2021-08-17 12:58 来源:ITPUB博客 阅读:0 作者: 栏目: 数据库

1 查看监听日志、发现监听终止是通过CRS发出的。

2 查看CRS的alert日志,发现CRS日志中有报错信息


查看 crsd.log 日志、发现监听的状态切换有异常。发现此问题同《 VIP, SCAN VIP/Listener Fails Over and Listener Stops After Short Public Network Hiccup ( 文档 ID 1333165.1) 》文档一样,以下4、5点都同此文档一致。

4 查看 orarootagent_root.log 日志



查看 oraagent_<user>.log 日志: 发现监听强制停止,重启失败。


原因:

结合 1-5 点,此种情况和《 VIP, SCAN VIP/Listener Fails Over and Listener Stops After Short Public Network Hiccup ( 文档 ID 1333165.1) 》文档相符,按照此文档进行修改。

 

以下为 oracle 官方相关文档:

VIP, SCAN VIP/Listener Fails Over and Listener Stops After Short Public Network Hiccup ( 文档 ID 1333165.1)

 

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.1 to 12.1.0.1 [Release 11.2 to 12.1]

Information in this document applies to any platform.

SYMPTOMS

After check timed out, 11gR2 Grid Infrastructure network resource (usually ora.net1.network) goes to INTERMEDIATE state, then goes back to ONLINE very shortly. This note will not discuss cause of check time out, but most common cause is public network hiccup.

 

Once network resource goes into INTERMEDIATE state, it may trigger VIP, service, SCAN VIP/SCAN listener, ora.cvu and ora.ons etc to be failed over/go offline due to resource dependence, which could result in unnecessary connectivity issue for that period of time. After network resource is back online, affected resources may not come back online.

 

·  $GRID_HOME/log/<node>/crsd/crsd.log

2011-06-12 07:12:31.261: [    AGFW][10796] {0:1:2881} Received state change for ora.net1.network racnode1 1 [old state = ONLINE, new state = UNKNOWN]

2011-06-12 07:12:31.261: [    AGFW][10796] {0:1:2881} Received state LABEL change for ora.net1.network racnode1 1 [old label  = , new label = CHECK TIMED OUT]

..

2011-06-12 07:12:31.297: [   CRSPE][12081] {0:1:2881} RI [ora.net1.network racnode1 1] new external state [INTERMEDIATE] old value: [ONLINE] on racnode1 label = [CHECK TIMED OUT] 

..

2011-06-12 07:12:31.981: [    AGFW][10796] {0:1:2882} Received state change for ora.net1.network racnode1 1 [old state = UNKNOWN, new state = ONLINE]

..

2011-06-12 07:12:32.307: [   CRSPE][12081] {0:1:2881} RI [ora.LISTENER.lsnr racnode1 1] new internal state: [STOPPING] old value: [STABLE]

2011-06-12 07:12:32.308: [   CRSPE][12081] {0:1:2881} CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'racnode1'

·  $GRID_HOME/log/<node>/agent/crsd/orarootagent_root/orarootagent_root.log

ᒤ  2011-06-12 07:12:08.965: [    AGFW][2070] {1:27767:2} Created alert : (:CRSAGF00113:) :  Aborting the command: check for resource: ora.net1.network racnode1 1

ᒤ  2011-06-12 07:12:08.966: [ora.net1.network][2070] {1:27767:2} [check] clsn_agent::abort {

ᒤ  ..

ᒤ  2011-06-12 07:12:31.257: [    AGFW][2070] {1:27767:2} Command: check for resource: ora.net1.network racnode1 1 completed with status: TIMEDOUT

ᒤ  2011-06-12 07:12:31.258: [    AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: ONLINE to: UNKNOWN

ᒤ  2011-06-12 07:12:31.258: [    AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 would be continued to monitored!

ᒤ  2011-06-12 07:12:31.258: [    AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state details has changed from:  to: CHECK TIMED OUT

ᒤ  ..

ᒤ  2011-06-12 07:12:31.923: [ora.net1.network][2314][F-ALGO] {1:27767:2} CHECK initiated by timer for: ora.net1.network racnode1 1

ᒤ  ..

ᒤ  2011-06-12 07:12:31.973: [ora.net1.network][8502][F-ALGO] {1:27767:2} [check] Command check for resource: ora.net1.network racnode1 1 completed with status ONLINE

ᒤ  2011-06-12 07:12:31.978: [    AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: UNKNOWN to: ONLINE

·  $GRID_HOME/log/<node>/agent/crsd/oraagent_<user>/oraagent_<user>.log

䀞  2011-06-12 07:12:32.335: [    AGFW][2314] {0:1:2881} Agent received the message: RESOURCE_STOP[ora.LISTENER.lsnr racnode1 1] ID 4099:14792

䀞  2011-06-12 07:12:32.335: [    AGFW][2314] {0:1:2881} Preparing STOP command for: ora.LISTENER.lsnr racnode1 1

䀞  2011-06-12 07:12:32.335: [    AGFW][2314] {0:1:2881} ora.LISTENER.lsnr racnode1 1 state changed from: ONLINE to: STOPPING

 

·  $GRID_HOME/log/<node>/alert<node>.log

䀬  2012-01-10 06:48:18.474 [/ocw/grid/bin/orarootagent.bin(10485902)]CRS-5818:Aborted command 'check for resource: ora.net1.network racnode1 1' for resource 'ora.net1.network'. Details at (:CRSAGF00113:) {1:24200:2} in /ocw/grid/log/racnode1/agent/crsd/orarootagent_root/orarootagent_root.log.

䀬  2012-01-10 06:48:43.481 [/ocw/grid/bin/oraagent.bin(8847542)]CRS-5016:Process "/ocw/grid/bin/lsnrctl" spawned by agent "/ocw/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/ocw/grid/log/racnode1/agent/crsd/oraagent_grid/oraagent_grid.log"

䀬  2012-01-10 06:48:43.552 [/ocw/grid/bin/oraagent.bin(8847542)]CRS-5016:Process "/ocw/grid/opmn/bin/onsctli" spawned by agent "/ocw/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/ocw/grid/log/racnode1/agent/crsd/oraagent_grid/oraagent_grid.log"

 

 

 

CAUSE

 

SOLUTION

The issue is fixed in a few different bugs:

1.  bug 12680491  fixes the dependence between network and VIP

 

The fix of  bug 12680491  will add intermediate modifier to stop dependency between network resource and VIP to avoid unnecessary resource state change, it's included in 11.2.0.2 GI PSU4, 11.2.0.3 GI PSU3, 11.2.0.3 Windows Patch 7, 11.2.0.4 and above. This fix is recommended instead of fix for  bug 12378938  to avoid the issue in first place. 

 

Once patch for this bug is applied, the following needs to be executed to change the dependence for all VIPs:

# $GRID_HOME/bin/crsctl modify res ora.<racnode1>.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.<net1>.network)"

 

For example:

# /ocw/grid/bin/crsctl modify res ora.racnode1.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"

Once the attribute is changed, a restart of nodeapps/VIP is needed to be in effect

 

2.  bug 13582411  fixes the dependence between network and SCAN VIP/listener

The fix of  bug 13582411  will add intermediate modifyer to stop dependency between network resource and SCAN VIP to avoid unnecessary resource state change, it's included in 11.2.0.3 GI PSU4, 11.2.0.4 and above.  

 

Once patch for this bug is applied, the following needs to be executed to change the dependence for all SCAN VIPs and to restart SCAN VIPs:

# $GRID_HOME/bin/crsctl modify res ora.scan<n>.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net<n>.network)"

For example:

# /ocw/grid/bin/crsctl modify res ora.scan1.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"

# /ocw/grid/bin/crsctl modify res ora.scan2.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"

# /ocw/grid/bin/crsctl modify res ora.scan3.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"

# /ocw/grid/bin/srvctl stop scan -f

$ /ocw/grid/bin/srvctl start scan_listener 

 

3.  bug 17435488  fixes the dependence between network and ora.cvu and ora.ons

The fix will add intermediate modifyer to stop dependency between network resource and ora.cvu and ora.ons to avoid unnecessary resource state change, it's included in 12.1.0.2


免责声明:本站发布的内容(图片、视频和文字)以原创、来自互联网转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系QQ:712375056 进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。