Email Validation Using Regular Expressions
Finding patterns in strings can be a common occurance in interface design. Ask someone to enter their email or physical address and who knows what you will get. Validating this data can be a time consuming and tedious task. There are only so many indexOf methods and if else statements one can handle. An alternative is using regular expressions.
What are Regular Expressions?
Regular expressions describe a pattern of characters in a string. Shortened to regExp or regEx this pattern can be used (among other things) to match content of strings.
For example if I wanted to check if someone knew my name or nickname I could use the regular expression \will\ to match the characters in a string "william" and "willie". However that also includes nicknames I don't go by such as "willow" or "willis". We can expand our expression to match a little better by adding an "i" to the expression. But that will only handle "willow" not "willis". So we adjust the expression using a metacharacter similar to a conditional to: \willi(e|am)\.
As another example let's validate a phone number. Let's restrict our requirements it a local 7 digit US number to keep it simple. We define acceptable permutations as:
- 123-3456
- 123.3456
- 1233456
We use the expression with the "^" metacharacter to specify the beginning of the string and nothing before. We follow that with "/d" which will require only numerals (0-9). Add the "{3}" requires 3 characters. We are currently at: \^\d{3}\.
We accept any character next so we add a dot, ".", which allows for any character and "{0,1}" which means we can have have zero to one character, but no more. Next we add \d{4} to match those last four digits. To make sure there is nothing afterward we use the metacharacter "$". Finally we have the expression: \^\d{3}.{0,1}\d{4}$\.
A Practical Example
Let's say we have a website where users are asked to enter a valid email address to continue to the next page. We are going to keep the requirements very simple for this example, such as:
- Must contain the @ symbol
- Must contain a character before the @ symbol
- Must contain a dot: .
- Must contain two characters between the @ symbol and dot
- Must contain two characters after the dot
Let's try to validate with JavaScript using a few standard methods:
function validEmail(str) { // Handle Requirement 1 var atLoc = str.indexOf("@"); if(atLoc == -1){ return false; } // Handle Requirement 2 if(atLoc < 1){ return false; } // Handle Requirement 3 var dotLoc = str.indexOf("."); if(dotLoc == -1){ return false; } // Handle Requirement 4 if(dotLoc - atLoc < 2){ return false; } // Handle Requirement 5 if(str.length - atLoc < 2){ return false; } return true; }
Now I'll try to make this more efficient:
function validEmail(str) { // Handle Requirement 1 and 2 var atLoc = str.indexOf("@"); if(atLoc < 1){ return false; } // Handle Requirement 3 var dotLoc = str.indexOf("."); if(dotLoc == -1){ return false; } // Handle Requirement 4 and 5 if(dotLoc - atLoc + str.length - atLoc < 4){ return false; } return true; }
Now we try using a regular expression:
function validEmail(str) { // Handle All Requirements 1-5 var regEx = /^.{1,}@.{2,}\..{2,}/; return regEx.match(str); }
Less lines of code? Yep. Less readibility? Yep.