It’s April 2015 and we started receiving some calls that our servers have vanished. Not good. Investigation reveals that a change in Google’s DNS handling means, as far as Google’s DNS servers are concerned, our servers don’t exist. And we’re not the only ones having this problem – it’s common to systems using CNAME aliases to Amazon AWS ELB (Load Balancers).
Here’s how to determine if this problem is affecting you and how to resolve it – short-term and long-term.
Google has very recently changed their behaviour when a Name Server doesn’t respond nicely to a UDP request for a CNAME record. So now if the response is not “nice”, Google does not get the CNAME record and when you ask Google servers for it – it’s not found.
Google is helping white-list servers in the meantime, and hopefully are going to come to some longer-term solution since it looks to be affecting lots of people.
The Google Public DNS Discussion Group is where you can see the problems and is where you request help on the whitelisting discussed below.
The problem will typically show itself in one of these ways:
- customers saying your web site is not found
- customers can’t ping or look up (eg nslookup) your server name (eg myserver.mydomain.com)
- parts of your software are logging errors like “server not found”
- nslookup or dig are returning errors like “SERVERFAIL” or “Warning: Message parser reports malformed message packet”
The errors / indicators above cover a variety of problems, so we next need to confirm whether the problem is the specific case mentioned – Google’s recent changes to DNS.
Confirm the Cause is Google’s Change
Below, replace “myserver.mydomain.com” with the name you are having trouble with.
1. First test: Can Google DNS see your server (DNS entry)?
nslookup myserver.mydomain.com 126.96.36.199
The command above will lookup your server name using Google DNS server 188.8.131.52. Google should know your server name (especially if this problem has just turned up).
If you get an error like this:
*** google-public-dns-a.google.com can’t find myserver.mydomain.com: Non-existent domain
then we need to carry on. If you don’t get an error, your problem lies elsewhere and unfortunately this blog probably won’t help you.
2. Second Test – Can other DNS servers see your server?
The following commands will test your domain against other well known servers. Google has a great page about diagnosing DNS issues where I found these server names.
nslookup myserver.mydomain.com 184.108.40.206
nslookup myserver.mydomain.com 220.127.116.11
nslookup myserver.mydomain.com 18.104.22.168
nslookup myserver.mydomain.com 22.214.171.124
If those commands successfully lookup your server name, then the problem IS GOOGLE SPECIFIC – carry on to try to fix the problem below.
If the commands also have errors, then your problem is not Google-DNS specific. You’ll need to go back to some fundamentals about your server name entry in DNS (has it been deleted or changed?). If LOTS of DNS servers can’t see your server name then there’s a broad issue.
Fix the Problem – NOW!
Given you might be stressed if you’ve reached this point, let’s get straight to fixing it. You should DO BOTH FIXES so it doesn’t occur again.
1. Immediate fix – short term
Google are being very friendly about helping out people affected by the problem, so they will “white-list” the necessary name servers. This will take less than 2 hours to take effect.
You have 2 steps to do:
i) find your name servers
If your problem record is “myserver.mydomain.com” then you want to find the Name Servers for the domain “mydomain.com”. You can do this by running the commands below, or going to your service provider web site (who ever you signed up to provide you with your internet domain name) and looking there.
nslookup -type=SOA mydomain.com
Which gives us
primary name server = my.nameserver.com
responsible mail addr = support.mydomain.com
serial = 2015010901
refresh = 86400 (1 day)
retry = 7200 (2 hours)
expire = 3600000 (41 days 16 hours)
default TTL = 86400 (1 day)
ii) ask Google (nicely) to white list the primary name server (bold above).
Go to the Google Public DNS Discuss Group forum and post there asking for help. Our issue was resolved within the hour via this forum.
Then just watch for the fix to take effect.
2 Longer Term Fix
This is a longer term fix because most service providers are not very responsive to such changes (we’re still waiting). The correction to a probably long standing problem might be fairly hard for the service providers, but hopefully being driven by Google will help them get on with it. In the mean time, the short term fix above by Google is exceptionally important to restoring normal operation.
We need to collect information to talk to the responsible service provider to have them fix the issue. We can use the NSLOOKUP (Windows and Linux) or DIG (linux) commands to get the information:
Using “nslookup” (windows or linux) – remember to replace the 2 names with yours:
nslookup myserver.mydomain.com my.nameserver.com
;; Got SERVFAIL reply from 126.96.36.199, trying next server
or, using :dig” on linux:
dig +norecurse myserver.mydomain.com. @my.nameserver.com
;; Warning: Message parser reports malformed message packet.
;; Truncated, retrying in TCP mode.
Send the above error information to your service provider (the owners of the name server). They should hopefully take it seriously that Google aren’t liking their implementation of DNS and do something about it.
3. Alternative Longer-Term Fix
If you don’t see much joy from your service provider in addressing the long term fix (remember you can use the commands above to test every now and then whether they have changed anything) you could also change service providers. Amazon Web Services have a DNS service called Route53 which provides the ability to have an “alias” to one of the Load Balancers without using a CNAME record. Using Route53, you would never have been exposed to this particular issue.
Changing service providers is non-trivial so do your reading and decide for yourself. I’ll certainly be looking because our service provider has been disappointing resoloving this urgent outage issue.
Google’s April 2015 changes to DNS processing have been a disruptive (certainly for me), but at least they have provided a responsive way to work around the issue.
The longer term fix is for you to convince your DNS service provider to respond to UDP protocol requests as expected by Google. Remember you can always switch to a DNS service provider that doesn’t experience the problem.