Regular expressions for text processing on strings, pattern matching, scanners. Text processing predicates:
Regular expressions are sequences of simple characters and meta characters. All characters which are not meta characters stand for themselves.
a stands for the character "a" ab stands for the character sequence "ab" %% stands for the character "%"
| disjunction (...) bracketed expression . any single character ^ first position of a string $ last position of a string %d digit %D not a digit %s white space %w character of a word (letter or digit or _) %W not a character of a word %< start of a word i.e. current character is %w and previous is not %w %> end of a word i.e. current character is not %w and previous is %w
%i all following characters are case insensitive %I all following characters are not case sensitive
[...] one of the listed characters [^...] not one of the listed characters The character "-" is used within a class to define a range, e.g. [a-zA-Z0-9_] is the same as %w.
(?:...) simple grouping (no sideeffects) (?=...) positive lookahead, true if ... follows, but ... is not consumed (?!...) negative lookahead, true if ... does not follow, nothing is consumed
Quantifiers are given as suffixes. The following quantifiers are defined:* any number of + at least once ? none or one {min,max} "min" to "max" times {min,} at least "min" times {n} exactly "n" times
Quantifiers are eager, they try to go for the longest possible matching string. To find the shortest possible matching string add a "?" to the quantifier.
Note: to add a backslash into an atom the backslash must be written twice ("escape of an atom").
Note: to use a percent mark in an expression the percent mark must be written twice ("escape of the regular expression").
match('(%d+)', 'one123four', L) => L = ['123']
match('(.*)a', barbara, L) => L = [barbar]
match('(.*?)a', barbara, L) => L = [b]
match_all('%<(%w+)', ' one two three ', L)
=> L = [one,two,three]
split('one two three', L) => L = [one,two,three]
chop(' one two ', L) => L = 'one two'
split('%s*:%s*', 'one : two :three', L)
=> L = [one,two,three]
substitute_all('a(.)', barbara, '%1a', L).
==> L = brabraa
e.g. to exchange the position of words
substitute('(%w+) (%w+)', 'one two', '%2 %1', L).
==> L = 'two one'
Whatever is bracketed is assigned to a term. Whatever is bracketed by the from left to right i-th opening bracket is assigned to the i-th term. Bracketed terms can be referenced in substitute/4 and substitute_all/4 with %1...%9 in the substitute expression. For match/3 and match_all/3 these are returned in the corresponding sequence. If a bracketed expression was not evaluated, e.g. because it appeared in a disjunction, then its result is ''.
Examples:
match_all('((%w+)|(%s+))', 'a few tokens', L).
=> L = [[a,a,''],[' ','',' '],[few,few,''],[' ','',' '],[tokens,tokens,'']]
match_all('%<(%w+)', 'a few tokens', L).
=> L = [[a],[few],[tokens]]
match/2/3
match_all/3
substitute/4
substitute_all/4
chop/2
split/2/3
get_line/1/2
Darueber
read on...