| |||||||||||
| Regular expressions for text processing on strings, pattern matching, scanners. Text processing predicates:
Regular Expressions:Regular expressions are sequences of simple characters and meta characters. All characters which are not meta characters stand for themselves.
Examples:a stands for the character "a" ab stands for the character sequence "ab" %% stands for the character "%"
meta characters:| disjunction (...) bracketed expression . any single character ^ first position of a string $ last position of a string %d digit %D not a digit %s white space %w character of a word (letter or digit or _) %W not a character of a word %< start of a word i.e. current character is %w and previous is not %w %> end of a word i.e. current character is not %w and previous is %w
case sensitivity%i all following characters are case insensitive %I all following characters are not case sensitive
classes of characters[...] one of the listed characters [^...] not one of the listed characters The character "-" is used within a class to define a range, e.g. [a-zA-Z0-9_] is the same as %w.
Grouping and lookahead(?:...) simple grouping (no sideeffects) (?=...) positive lookahead, true if ... follows, but ... is not consumed (?!...) negative lookahead, true if ... does not follow, nothing is consumed QuantifierQuantifiers are given as suffixes. The following quantifiers are defined:
Examples:
match('(%d+)', 'one123four', L) => L = ['123']
match('(.*)a', barbara, L) => L = [barbar]
match('(.*?)a', barbara, L) => L = [b]
match_all('%<(%w+)', ' one two three ', L)
=> L = [one,two,three]
split('one two three', L) => L = [one,two,three]
chop(' one two ', L) => L = 'one two'
split('%s*:%s*', 'one : two :three', L)
=> L = [one,two,three]
Bracketed UnitsWhatever is bracketed is assigned to a term. Whatever is bracketed by the from left to right i-th opening bracket is assigned to the i-th term. Bracketed terms can be referenced in substitute/4 and substitute_all/4 with %1...%9 in the substitute expression. For match/3 and match_all/3 these are returned in the corresponding sequence. If a bracketed expression was not evaluated, e.g. because it appeared in a disjunction, then its result is ''.
Examples:
match_all('((%w+)|(%s+))', 'a few tokens', L).
=> L = [[a,a,''],[' ','',' '],[few,few,''],[' ','',' '],[tokens,tokens,'']]
| |||||||||||
| |||||||||||
| Back> |
|