Syntax of regular expressions in JavaScript and a core collection
A set of rules describing a condition in the compact form of a regular expression. This allows to isolates a text in a page and eventually replace it.
A regular expression is defined by an object or a literal.
The literal form of an expression has a special format, it is included
between two slashes:
var er = /xyz/
While the object is created from a common string between quotation marks:
var er = new RegExp("xyz")
When entering a regular expression from a form, we get a common string,
the object must then used to assign the expression to a variable.
Building a
regular expression, syntax and operators
The construction depends only on knowledge of operators of regular expression
and special characters, as well as global modifiers.
Special Character
Special characters are introduced by the "\" code. In a literal expression
(or in a form), but in a string, the slash is doubled.
x = /a\r/
x = new RegExp("a\\r")
This coupled
with a letter represents a code that could not be displayed directly (such
as line feed for example), but it is also used when it is associated with
a code operator, to designate the character rather than the operator of
regular expression:
\n Means the end of line and not the letter n.
\* Refers to the star character and not the operator of expression regular.
\t Tabulation code and \v vertical tab.
\r Line feed code.
\f Form feed code.
\s Any separarator, including blank space, tabulation, line feed, form feed.
\S Any character other than a space, it is the opposite of \s.
\d Any digit. Similar to [0-9].
\D Any non-digital character. Same as [^0-9].
\w Any alphanumeric character. Same as [_A-Za-z0-9].
\W Any NOT alphanumeric character. Is the opposite of \w and is same as [^_A-Za-z0-9].
\nnnn Where nnnn is a positive integer.
\0 Represents the code 0 in the binary file (and not the 0 digit in the text).
\xhh Where hh is an hexadecimal pair. Represents a code in the binary source.
\uhhhh Four digits hexadecimal number.
Operators
By combining elements in an expression, we can apply logical operators.
Adding to this the intervals, it becomes possible to express with few
letters a set of rules.
The dot
The dot means any character in the text to compare, but the code of end
of line.
Groups
()
The parentheses denote a group to recall, when the element in brackets
is found, returned in the array or results and also in the variables of
the object RegExp. The pattern (.) designates any character. Coupled with
the + operator, as in (.)+ that means any character, and one at least,
either a single character or a string.
For example (ba) can find "bar", or "barrel", or "sidebar",
but "brain" is not accepted. Then ba is recalled.
(?:x)
Not capturing parentheses. The x element is searched, but it is not
stored and is not present in results for the method that returns an array.
Neither in internal variables.
[]
The square brackets designate an alternative. We are looking for one
or the other elements in the list. In the case where [abc] is searched,
then "ara", "bridget", "corel" can match (if we are testing the first
letter.)
Interval
-
The dash symbol between two letters or digits designates an interval.
Examples:
a-z list of letters. Any letter in the list can match.
A-Z list of capitals.
0-9 list of digits.
Operators of parts
These symbols are used to designate a specific part of the text to compare
with the regular expression.
^
Specifies that the element that follows, character or group should be
placed at the beginning of the text to match the search. If the pattern
is /^a/ the text "angela" matches and not "christina".
In the case of a text in several lines, with the modifier "m" option,
this applies to the beginning of each line.
$
Specifies that the previous item, character or group must be at the end
of the text. If the pattern is /a$/ the string "angela" was not accepted,
but "christina" matches.
In the case of a text in several lines, with the modifier "m" option,
this applies to the end of each line.
?
The preceding string may be present or not, means that there may be a
letter or none. This allows to skip a character when it is present to
apply the regular expression on the part of the text that comes after.
Operators of quantity
+
There must be one element or more of the letter or group followed by
the symbol.
Examples:
a+ there must be one or several lettes a.
[abc]+ there must be one a or b or c or more of these letters (not a combination).
*
There may be
an undetermined number of occurrence of the previous text, or none.
{ n }
n is an integer. This is the number of occurrences that are being expected.
Example:
a{2} looking for a chain which contains "aa".
{ x, y }
x and y represent two positive integers. There will be at least x occurrences
and and no more than y occurrences.
For example: {2, 3} search for two or three occurrences of a chain.
Logical operators
x | y
The bar is the inclusive OR operator.
Example: (abc|def)
We are looking for the chain which contains either abc or def (or both).
[^]
The symbol "^" when it is bracketed does not mean the beginning of a
string but excluding it.
Example:
[^xyz]
The expression
represents all letters except x, y and z.
Conditional operators
x(?=y)
The text corresponds
when x is followed by y.
Example:
me(?=she)
When "me" was followed directly by "she" in the text,
the expression is matched. The two chains are added to the array of results,
one writes: me (?=(she))
Example:
(0-9)+(?=\.)(0-9)+
Represents a decimal: Chain of digits, dot, and decimals. This can be
written simply: \d+\.\d+
x(?!y)
The text x corresponds
if not followed by y.
To represent a
whole number we could write:
[0-9]+(?!\.) But [0-9]+ would
be easier.
Important
Note
In a string, the code "\" must be doubled. For example, you write \\d to represent the symbol \d, a digit. This is not the case when
one enters the regular expression in a form, or in the literal form:
/\d+/
Modifiers
Modifiers are codes that apply a general rule to use the regular expression.
For example, the
letter i means that there should be no difference between upper and lower
case.
These are the letters i, g and m.
var er = /xyz/i
var er = new RegExp("xyz", "i")
You can use one or more modifiers at a
time.
For example:
var er = /xyz/igm
Uppercases
The "i" code states that no difference is made between upper and
lower cases in the text. For example, if one applies the regular expression
to the chain "doe", it will have the same result as "Doe" or "DOE."
Global
The "g" code indicates a global search.
Multiple lines
The "m" code states that the regular expression is applied to each
line in a texte with several string separated by the end of line code. In
case this option is chosen, the comparison is attempted for each line.
Methods of RegExp and modifier
A method of the RegExp object may by associated to a literal string.
/xyz/i.exec("xxx")
The method is not associated with the code "i" but to the whole
string /xyz/i.
This is similar to:
er = /xyz/i
er.exec("xxx");
List of commonly used regular expressions
Examples of regular expression that could be commonly used to recognize
a string of modify it.
The expressions must be enclosed between two antislashes or quotation
marks in a source.
Check if we have an integer
-?[0-9]+
A decimal number
-?\d+\.\d+
Un alpha-numeric string
Made only of alphabetical letters, lowercase or uppercase, or digits.
^[a-zA-Z0-9]+$
The full code:
var re = new RegExp("^[a-zA-Z0-9]+$", "g"); if(!re.test(str)) return false;
Removing quotation marks
This may be useful when the content of an HTML file is parsed.
[\"\']([^\"\']*)[\"\']
var re = /[\"\']([^\"\']*)[\"\']/ var test="'some text"; document.write(test.length());
var arr = re.exec(test); document.write(arr[1].length());
How to validate an email address
([\w-\.]+@[\w\.]+\.{1}[\w]+)
var re = /([\w-\.]+@[\w\.]+\.{1}[\w]+)/;
if (re.test(email)) document.write("valid");
How to validate a URL with a regular expression
(http://|ftp://)([\w-\.)(\.)([a-zA-Z]+)
Replace the trim() fonction
str = str.replace(/^\s\s*/, '').replace(/\s\s*$/, '')
Online tool
Online tool to test
regular expressions in JavaScript .
Buttons corresponding to the operators help to define an expression that
applies to different types of texts, predefined and modified by the user.
See also
The RegExp object.
It has methods to perform global processing.
© 2008-2012 Xul.fr