Category: Issues

Partial Outages Resolved

For the past couple of days there have been partial outages occurring in the form of network endpoint failures and routing ceasing to function for five to ten minute intervals, along with high latency. Unfortunately, this had occurred during my travels for work the past few days and neither I nor the other tech that helps with the NOC had time to triage and figure out what had happened.

After finally getting a moment to take a look at the network, I found that three of our routers were maxing out their allotted CPU limits and were failing to route traffic. What I discovered during a closer look at the affected routers is that during the last firmware upgrade, our logging settings got reset to defaults which means that all routers on our network were starting to experience an issue with running out of disk space, causing them to malfunction in a way that would max out their other resources. This also causes the GUI to go offline, meaning nothing could be done to fix this until I was able to tunnel in and pop a shell.

The fix to this issue was to remote into to each router and delete the offending log data and then reboot the routers one by one. Once that was handled, log limits were reconfigured in the GUI and services were brought back online and tested to ensure that all parts of the network are functional again. Furthermore, alert rules we placed into the NMS to fire off both email (SMS) and Discord notifications should any router’s disk become over 65% used. This should allow us to catch any future issues like this before they affect the network’s usability.

Caching Oops…

Seems our caching service and our temporary anti-VPN stance with the ongoing issue with a specific individual caused the caching service to pickup the VPN block page and display it as our homepage. I let the NOC know what appears to be happening and they should be working this shortly. Sorry for the oops.

Anubis is Down

Temporarily Anubis will be offline, meaning we are serving our website directly for the moment. I need to make some changes to the configuration because we have noticed some issues in logging.

Network Issues, Failures with Aedon, Phy Two

We are aware that our secondary physical server is experiencing network issues. Accounting, DNS and NMS keep going down and flooding our discord with service alerts. Our schedule has us on the road and unable to diagnose what is causing this issue right now, I will get the smol raptor to check over things when we get a break and able to catch up later tonight after our flight to Calgary.