Domain Name System (failure): Ancient Technology Ruling the Internet

Domain Name System (failure): Ancient Technology Ruling the Internet

During my studies at the University of Twente, I developed a fascination for the Domain Name System (DNS). How could a thirty-five-year-old, insecure-by-design, barely standardized system still be among the most widely used technologies today? Even with the additions of various security and performance improvements in the past decades, the DNS remains a quirky part of the Internet. During my graduation research on domain name management, I took ample time to get to know the dark, funny and sometimes scary properties of the DNS. In this article, I will guide you through some of these facts that may raise your eyebrows.

History and basics

The purpose of the DNS is to resolve domain names. ‘utwente.nl’ is an example of such a domain name, consisting of two labels ‘utwente’ and ‘nl’. Typing ‘utwente.nl’ in your browser does not tell your computer where the UT servers are. Therefore, it queries the DNS for its corresponding IP address. This works in a hierarchical way: starting at the root, you are redirected to the server for ‘nl’, which in turn redirects you to the server that knows the IP address for ‘utwente.nl’. A fully qualified domain name is supposed to end with a dot, indicating the root (empty) label at the end.

The DNS is a distributed system and in 1985 replaced the single shared ‘hosts.txt’ file that was used for this purpose until then. Its ‘formal’ specification was captured in standards RFC 1034 [1] and 1035 [2], although plenty of choices are left for actual implementation. So far, it appears as a simple process, with a straightforward goal and a suitable solution. Let’s get to it.

Insecure-by-design

A mere five years after its conception, major security design flaws are found in DNS [3]. Basically, the distributed system did not provide for any trust anchors, with which you could confirm that a DNS response was genuine. Instead, anyone could reply to your DNS query and redirect you to an IP address controlled by them. Not good!

Several improvements were made to mitigate this type of vulnerability, yet the ‘full fix’, being the DNS Security Extensions (DNSSEC), is far from being fully adopted. That means that the majority of the DNS infrastructure, in theory, remains vulnerable to this thirty-year-old security issue. Nice going!

The secret of the trailing dot

As mentioned earlier, domain names should end with a trailing dot. Yet, in our daily use of domain names when navigating to websites, this is not used… Right?! Well, citing RFC 1034: “[...] a multi-label relative name is often one where the trailing dot has been omitted to save typing.”

This means that for DNS, ‘utwente.nl’ is the same domain name as ‘utwente.nl.’. This is where it gets grim. While the DNS might see them as equals, in the eyes of browsers, these are two different domain names. This means cookies, sessions and the like, are not shared between these two. To make matters worse, since both variants are valid values for the ‘Host’ header of the HTTP protocol, web servers can differentiate between them as well. We observe this in practice at our own University, as the UT domain name with a trailing dot currently gives a 404 error, instead of the UT website (Figure 1).

The obscure specification of DNS might have saved us from writing one extra dot all the time, yet has given us two ambiguous domain names with unpredictable behavior in return. Fair trade, or a bad idea?

utwente.nl. is not recognized by UT’s web server

Internationalized Domain Names

Not too long after the emergence of DNS, people realized that restricting domain names to Latin/ASCII characters only would not really contribute to the concept of “one world, one Internet”. Yet, any programmer knows that handling foreign scripts can be a real pain. For so-called Internationalized Domain Names (IDNs), the solution was found in using Punycode, a representation of Unicode using only ASCII characters [4]. I won’t dive into the details here, but all IDNs start with ‘xn--’, followed by the representation in ASCII. For example, münich.com would translate to xn--mnich-kva.com. Since regular domain names may not contain two hyphens in a row, this unambiguously indicates an IDN. This way, local scripts can be used in domain names, while barely impacting the technical implementation of DNS. So far so good!

The bad thing about Unicode in the context of IDNs is that it contains so many characters. Many scripts use characters that are identical to an ASCII letter, yet are different characters in Unicode. Unicode also contains emojis, many of which are hard to distinguish from each other. You can see the problem here, unrestricted use of Unicode in IDNs is a recipe for disaster.

While being a bit late at the party, the Internet community saw this problem too and restricted IDNs by means of the IDNA2003 protocol, later superseded by the IDNA2008 protocol [5]. Among others, IDNA restricts the usage of emoji and the use of multiple scripts in a single domain label. Sadly, these protocols came too late to prevent the registration of some troublesome IDNs, for instance xn--n3h.com (the snowman emoji WITH snow, the one without is not registered, Figure 2).

Even with IDNA, dubious domain names can still be registered. In 2017, a security researcher registered the IDN xn--80ak6aa92e.com. These days, you will most likely get a security warning from your browser, but this IDN uses Cyrillic characters that look identical to the word ‘apple’. Thus, while using a single script in the domain name label, a company name in the Latin script has been successfully imitated.

Some emoji-based DNS hostnames were registered before the IDNA protocols. The snowman with snow emoji is registered as a .com domain name (xn--n3h.com). The snowman without snow is not registered (xn--58h.com) and never will be.

Forever unencrypted?

Another classical issue with DNS is that in its original design, the protocol is not encrypted. While our actual website traffic is mostly encrypted these days through the use of HTTPS instead of HTTP, encrypted variants of DNS are slacking behind. In July 2023, less than a quarter of all DNS queries from the Netherlands were encrypted [6]. This results in a potential leak of all domain names that you have visited (or at least resolved).

Interestingly enough, two quite suitable solutions exist for DNS encryption already. DNS-over-TLS (DoT) uses a TLS encryption scheme, similar to HTTPS. DNS-over-HTTPS, in contrast, encapsulates the DNS request inside an HTTPS request, making it barely distinguishable from regular website traffic. As far as I can see, the question should not be if we want to encrypt all our DNS traffic, but rather by which means.

Internet governance

Finally, I wanted to spend a few words on how the DNS is governed and how you could contribute to it. As explained, the technical foundations are outlined in RFCs. These documents are monitored by the IETF, yet anyone in the Internet community can contribute to them. If this article raised your interest in the technical foundations of DNS, be sure to get involved in related mailing lists. Perhaps, you can prevent the next ‘epic fail’ that would have been added to this article.

Policy on domain names, top-level domains and the like, in contrast, are set by ICANN. This non-profit organization is operating by a multi-stakeholder governance model in which individuals, organizations, industry, and governments collaborate. The discussions here can be of technical nature, yet you can expect quite some politics as well. If this sparks your interest, be sure to check out the introductory programs they offer specifically for students [6]!