Troubleshooting : Self‑Test and RN Boot Problems
  
Self‑Test and RN Boot Problems
Use the following table to troubleshoot self‑test failures with agent‑based and agent‑less PNs as reported by the onQ Portal and one or more of the alerts outlined in Self‑Test Alerts. For each symptom, this table lists the most likely cause first; therefore, verify the possible causes in the order that they are listed.
Before You Begin: Verify that the RN is in a usable state by booting it in test mode. If the RN fails to boot, reinitialize it to eliminate the possibility that the problem is with the RN build process itself. Afterward, proceed with the table below.
Symptom
Possible Cause
Solution
RN may or may not boot
Occurs immediately following a Linux PN enrollment. Typo in boot menu loader or mismatch between the kernel versions and the initramfs version.
If you’re in the process of enrolling the Linux PN for the first time and the RN won’t boot, see the troubleshooting tips in (Agent‑based Linux PNs) Enroll protected nodes or (Agent‑less Linux/Windows PNs) Enroll protected nodes.
Operating system is not supported.
Verify that the operating system is supported. Go to Platform Support.
RN has insufficient resources.
If RN boots, verify that there is adequate memory and disk space for the RN: go to Monitor disk space and memory usage. If there isn’t, provide the RN more resources.
Unresolved dependencies led to the onQ Service not starting on time on the RN.
Determine if the RN is dependent on another service, then make that service available. Some services might take time to start; eventually those services might succeed or fail or time out thereby delaying the onQ Service from starting on time. Such services might rely on other resources (internal or external). For example, the service might be waiting for a mount manager (internal resource) to process huge mount points leftover in the registry, or for a time server/Domain Controller (external resource) to become available.
RHEL 7 RN cannot boot and self-test fails
/boot/grub/grub.conf.xvf5 boot menu doesn’t exist or has incorrect contents
RHEL 7 RN boots with desired IP address, but self-test fails
Misconfigured firewall rules.
The RHEL 7 default firewall service is firewalld. However, you can use iptables service instead. For specific instructions, go to (RHEL 7.0) To enroll an agent‑based Linux PN: or (RHEL 7.0/ESXi) To enroll an agent‑less Linux PN:
RN booted with correct IPs but no network activity
PN has a dynamic IP address.
Make sure that the PN has a static IP.
PN is running on an OEM version
Make sure that the PN is not running an OEM version. Networking might be disabled. For more information, go to Go to Platform Support.
BSOD and RN cannot boot
RN has a faulty service
If there is a blue screen, read the blue screen code as it might indicate the service that caused the blue screen. Disable that faulty service; if that doesn’t work, capture the BSOD screen and contact Quorum Support.
RN/PN takes too long to boot
RN has slow‑to‑start service
Does the RN take more than 15 minutes to come up? If yes, check the Windows event log and figure out which service(s) is taking a long time to start or which service(s) is timing out. Disable or reconfigure this service to speed up the boot time.
PN has large registry
Check the PN's boot time. If the PN has a long boot time, the problem needs to be addressed at your PN site. For example, the PN might have a large registry that causes the PN to boot slowly. Resolve the root cause of the growing registry and clean up the registry.
RN reboots continuously
Windows Server 2012 R2 Bug
(Windows Server 2012 R2) If RN continues to reboot, boot it in self‑mode, then search the Windows event logs for the suspicious service and disable it. For example, the ShellHWDetection service is notorious for continuously rebooting an RN running Windows Server 2012 R2. Disabling the ShellHWDetection service resolves the problem.
RN boots, but has no XenServer NIC hardware
3rd party software
In specific cases, some 3rd party software applications prevent the XenServer NIC from functioning.
For example, Symantec endpoint protection 12.1.4100.426 blocks the XenServer NIC from functioning. In this case, disabling the Symantec endpoint protection services resolves this problem.
RN boots with XenServer NIC, but has an incorrect IP address
Misconfigured RN network
Check the RN NIC configuration and IP configuration via the onQ Portal. Verify that xvf.dat exists on the PN and that the PN does not have a custom network configured. Make sure the network configuration is correct.
RN boots with XenServer NIC, but has no IP address
Misconfigured cluster network
RN does not own the cluster resources as set forth in the cluster policy, so the IP address is inactive. If you want the RN to own the cluster resources, use the Enable Cluster Support parameter or change the cluster policy.
3rd party software
In specific cases, some 3rd party software applications prevent the IPs from attaching to XenServer NIC. For example, Network load balancing software (NLB) might block an IP from being used. Also, the WLBS Windows NLB service blocks NIC IPs on Windows Server 2012R2; therefore, delete the WLBS service on the RN build via the onQ Portal, if required. Disable the WLBS service on other Windows Server platforms as needed.
RN booted with correct IPs but no network activity or limited network connectivity
3rd party software
Check if any 3rd party network management software is installed: Antivirus/firewall/network manager/.... If yes, disable this software on the RN build.
If the self‑test still fails or the RN cannot boot, contact Quorum Support, providing the information outlined in Helpful System Information. In addition, provide us the answers to the following questions to speed up resolution:
Is there a PN with the same operating system on the same onQ Appliance where self‑tests are passing?
Is this a newly protected PN?
Is this PN experiencing sudden self‑test failures with prior success?
(Important!) In addition:
Did you provide a screen shot of RN’s Programs and Features (Control Panel > Programs and Features)? If not, please do so. This output tells Quorum what 3rd party software is installed.
Did you include a screen shot of the RN’s BSOD, if applicable? If not, please do. This output might indicate the offending service.
Did you include RN’s output from systeminfo command? If not, please do.