Dns Design

ns0a is the primary name server. Every 4 hours cron runs a perl script /opt/dns-auth/bin/dns_build_configs2 to build the various dns configurations files. There are several that it builds:

named.conf.master: the named.conf for the main authoritative dns server (the one that runs out of /opt/dns-auth)
named.conf.master.secondary: the named.conf for the main secondary server (on ns0 in /opt/dns-secondary)
named.conf.auth: the named.conf that the authoritative name servers in ***** use
named.conf.auth.hou: the named.conf that the authorative name servers ***** use (the systems in ****** that are authoritative for ***** zones that is)
named.conf.forward: the named.conf that the caching name servers use. It's called forward because it has forwarder's clauses for all of our internal zones back to our authoritative name servers.

dns_build_configs2 builds the conf file based, mostly, on /opt/dns-auth/bin/zones/named.master. This is a flat file that has a pretty simple format. dns_build_configs2 writes log files to /opt/dns/etc/zones/production/log

The configuration files are built then moved into place; ns0's configuration files are installed by dns_build_config2. The rest of the config files are moved into /opt/dns-auth/etc/conf so thay can be downloaded by the outer name severs. Those nameservers use a perl script, pullnamedconf.pl, which runs once an hour. It uses ssh to log into ns0a as the user named, checks the timestamp of the config file it's downloading and scps the file when it changes. The file is moved into place and rndc -reconfig is execd. pullnamedconf.pl's parameter is the config file to download for that machine (so, ns1a.auth runs /opt/dns/pullnamedconf.pl auth). At the end of dns_build_configs, copy of all of the conf files are placed into /opt/dns-auth/etc/zones/production.

dns_build_configs2 also enforces the zone audit (it removes bad zones from named.master). The auditing itself is done in /opt/dns-auth/etc/bin/dnsaudit.pl, another perl script which runs once a day. It saves it's audit data to a berkley db in /opt/dns-auth/etc/conf. dnsreport.pl runs immeditly after dnsaudit.pl; it writes an audit report to /opt/dns-auth/etc/zones/production/audit_output. The auditing code checks to make sure each zone is in the root name servers and that we are authoritative. A zone may fail the audit for 4 straight days before it's removed from named.master. Putting it back into named.master will reactive the zone and give it another 4 days to shape up.

/opt/dns-auth/etc/zones is nfs mounted on each of the orbit machines, which is at least all four of the web servers, that need it and ***** so the techs can run command line tools to edit the zones. There are pages in ****** that allow for zones to be edited, but anything complicated is beyond it's ability. ***** is also pretty easy to trick into breaking a zone file. Unless support is adding a lot of zones at once, they are supposed to use **** as much as possible. ****** keeps a list of zones in a table in a database, so if the zone isn't in the table it doesn't appear in ****. ********** also knows nothing about in-addr.arpa or secondary zones. The scripts on ******** are edit_zone, edit_reverse, and several other perl scripts to edit and add zones. Currently the scripts need to be run through sudo.

The customer side of Orbit allows editing of zones by customers, but I don't think it allows them to actually add zones. This is to prevent bad things from happening.

There is a zone whitelist in /opt/dns-auth/etc/zones/named.whitelist. This list of zones, plus any in-addr.arpa zones are added for the forward named.conf. There is a zone black list in /opt/dns-auth/etc/zones/named.blacklist.

There's another ns0, ns0.*******.com which is used for the domains that were registered through us. It has a much older version of dns_build_configs. It supports 4 authoritative name servers which are SunFire v120s

When I got here three years ago, we had ns1 and ns2.**** In an emergency, a couple of weeks after I started, we built ns1 and ns2.****.com. All four sets of systems were authoritative. At the time, both sets were SunFire v120s (8 for ***** and 12 for *****) behind single alteon 180e load balancers. Later when we opened ****** we added 4 v220s there. These systems were also configured, at the time, to be authoritative.

Over time some of the v120s failed and got replaced with linux dual xeon boxes each built differently at a different time so we decided about 18 months ago to replace the v120s and rebuild the all of the systems to match each other. All of the v120s are now retired (except for four that we use for *****.com, and 4 v220s at *******) and all of the linux systems have been rebuilt using a single kickstart configuration. All of these systems are dual xeons with 2G of ram. About half have raid and about half have redundant power. If a set of machines doesn't have redundant power then the systems in the farm are at least powered redundantly between themselves.

Once we rebuilt the two original farms, we built 9 new authoritative name servers. There's five at ***** and 4 at ******. A couple of months ago we added four more authoritative servers in *****. These systems are also built using the kickstart config and are all identical: Dell PE1850s with mirrored root drives, 4G of ram and redundant power.

When we first started rebuilding ns1 and ns2.***.com, we converted the name servers in *** and **** to caching only. After we put the authoritative name servers into production, we converted the ***** load balanced pools to caching only. All of the caching name servers are locked down. They will not respond to queries from outside our network. A list of our netblocks is kept in /opt/dns-auth/etc/zones/conf/localnets.txt.

Each of these systems has a distinct public ip address and an ip on the management network except for the systems in ****.

As you can see, the authoritative name servers are not load balanced. These systems average three or four nines by themselves and I felt that load balancers would just be expensive and just as likely to break. Additionally, I'd rather have more ip addresses in the root name servers and not less. If at some point we decide our needs have changed, we open 4 new datacenters all over the country and decide to deploy auth name servers in each of them for example, it should be trivial to add add load balancers in front of the systems.

The authoritative name servers have 53K zones. It takes about 25-30 minutes to start bind. The memory foot print is about 350Meg3 and grows very little. The caching name servers start instantly.

All of the linux systems run a cron job that checks the udp recv queue to see if it's greater than 500K for more than 10 minutes. If that condition is true, it leaves a lock file in /tmp. Every 15 minutes a cron job runs on ns0a that uses ssh to log into each linux name server and checks for the lock file. If it's there, named is killed (using SIGKILL) and restarted. We find that this restarts bind just about the time it becomes too big to keep up with requests. This system runs on each of the linux systems, but it's only ever restarted bind on the caching name servers. Almost all of the calls you'll get when on call that are related to dns fix themselves using this system.

The next major dns related project we need to complete is migrating all of the zone data and provisioning into an sql database. This is a work in progress that's probably about 80-90% done at this point. There is a system at ***** dnsdb01, that's got a mysql instance and a database, dnsdb. The code that's been written, again in perl, is on ns0a in /opt/dns-test/etc/bin/dnssql. The idea is to do, roughly, what we're doing now only the sshing and scping is replaced with sql queries. There will be a cron on each of the name servers that generates a new config file every now and again (my goal is to get it down to 1 - 1.5 hours between rebuilds). ns0a will run a daemon that checks for zone changes every few minutes, generates new zones and reloads them as needed. Orbit will talk directly to the mysql database. This gets orbit out of the business of writing zone files (something it should not be doing), provides us with a buffer between the humans and the name servers and generally makes it easier to manage the data.

The last time I got any type of status on the *** side of this (it's been a while since it got any real time in Development), they indicated that it was about 85% finished, but that it was still incomplete. Our side of this, the provisioning, really hasn't gotten much more than a few hours of time in the last couple of months. I'd estimate that we probably need 15 to 20 more hours to get it all working, plus more time for testing and refinements. I also intend to maintain the command line interfaces on **** for compatibility, cultural and efficiency reasons.