Our status graphs and a few v6 only services should be coming back online shortly. We received word from FurrIX, our network provider, that they had a minor issue with their secondary IPv6 network that caused some functions to stop working from Mar 20th to Mar 24th. As far as I am aware, nothing other than our status graphs and the v6 interface for NS2 were affected during this time.
Category: Network
Post dealing with changes to how we route packets and configure our network.
We have gotten notification from the NOC of our networking provider that they have chosen not to renew the domains ‘birb.rest’ and ‘avali.rest’ due to cost reasons. These domains were only used for personal splash pages and a handful of user subdomains that have not seen lookups in some time. This should not affect our operations in any meaningful way.
We have also heard that FurrIX is dealing with a DNS amplification attack and will be temporarily dropping any IP address that cross over 40 request per second until the incoming traffic targeting a few specific domains has let up.
We are seeing absurd amounts of wordpress attack traffic coming from the country of Singapore and have updated our network rules to drop traffic originating from the country for the time being.
For the past couple of days there have been partial outages occurring in the form of network endpoint failures and routing ceasing to function for five to ten minute intervals, along with high latency. Unfortunately, this had occurred during my travels for work the past few days and neither I nor the other tech that helps with the NOC had time to triage and figure out what had happened.
After finally getting a moment to take a look at the network, I found that three of our routers were maxing out their allotted CPU limits and were failing to route traffic. What I discovered during a closer look at the affected routers is that during the last firmware upgrade, our logging settings got reset to defaults which means that all routers on our network were starting to experience an issue with running out of disk space, causing them to malfunction in a way that would max out their other resources. This also causes the GUI to go offline, meaning nothing could be done to fix this until I was able to tunnel in and pop a shell.
The fix to this issue was to remote into to each router and delete the offending log data and then reboot the routers one by one. Once that was handled, log limits were reconfigured in the GUI and services were brought back online and tested to ensure that all parts of the network are functional again. Furthermore, alert rules we placed into the NMS to fire off both email (SMS) and Discord notifications should any router’s disk become over 65% used. This should allow us to catch any future issues like this before they affect the network’s usability.
