D. Štrbac

Simpler Email Address Format

Email addresses carry a legacy of outdated conventions, cruft, making their validation a notoriously complex task, despite many deprecated features like address literals and source routes. The regular expression pattern for this purpose can be incredibly intricate.

The email address typically include a "display name," but this free-form label represents one of the primary offenders of email security and as such it must be dropped from usage. Lets focus only on the address part here.

The prevailing email address format used by the majority of systems today is remarkably straightforward. There exists an unspoken contemporary standard for email addresses, one that has organically evolved and been embraced by email users themselves.

An email address comprises two segments: the local part, situated to the left of the "@" symbol, and the hostname, positioned on the right side of it.

The local part permits the use of alphanumeric characters and certain specific symbols, namely ".", "-", "+", and "_". It's crucial to emphasize that the local part must not start or end with a special character, and consecutive special characters are not allowed. These special characters are frequently employed as word separators (e.g., name.surname) and, in some instances, serve functional purposes, such as facilitating sorting or filtering. The specific interpretation of these characters is beyond the scope of our discussion here.

The hostname, on the other hand, can be any valid DNS hostname. IP addresses are not accepted as a substitute for hostnames.

In summary, the simplified regular expression pattern provided below has proven more than adequate for the vast majority of cases:

^(?i)[a-z0-9]+([._+-]?[a-z0-9]+)*@[a-z0-9]+([.-]?[a-z0-9]+)*\.[a-z]{2,}$

I have yet to encounter a scenario where the aforementioned format fails to meet the requirements.