Regular expressions for text processing on strings, pattern matching, scanners....
Move to
=> <=
Darueber Spitze
Zoom to
match/2/3
match_all/3
substitute/4
substitute_all/4
chop/2
split/2/3
get_line/1/2
See also
Arithmetic comparison
Arithmetic evaluation
Atomic term processing
Text Processing and Regular Expressions
Blackboards
BinaryIO
Character input/output
Constraints
Compilation
Clause creation and destruction
Clause retrieval and information
global variables
Grammar rules
Lists
Logic and control
All Solutions
Communication between distributed Minerva processes.
Stream selection and control
Servlet
Term comparison
Term creation and decomposition
Term input/output
Term unification
Term vector
Term Stream
XML (eXtended Markup Language) terms
Asynchronous timer handling
Type testing
IF Computer > MINERVA > Manual > Reference > Predicates > Text Processing and Regular Expressions

Text Processing and Regular Expressions

Regular expressions for text processing on strings, pattern matching, scanners. Text processing predicates:

  • match/2/3
  • match_all/3
  • substitute/4
  • substitute_all/4
  • split/2/3
  • chop/2
  • get_line/1/2

Regular Expressions:

Regular expressions are sequences of simple characters and meta characters. All characters which are not meta characters stand for themselves.

Examples:

	a		stands for the character "a"
	ab		stands for the  character sequence "ab"
	%%		stands for the character "%"

meta characters:

	|		disjunction
	(...)		bracketed expression
	.		any single character
	^		first position of a string
	$		last position of a string
	%d		digit
	%D		not a digit
	%s		white space
	%w		character of a word (letter or digit or _)
	%W		not a character of a word
	%<		start of a word 
			i.e. current character is %w and previous is not %w
	%>		end of a word
			i.e. current character is not %w and previous is %w

case sensitivity

	%i		all following characters are case insensitive
	%I		all following characters are not case sensitive

classes of characters

	[...]		one of the listed characters
	[^...]		not one of the listed characters
	The character "-" is used within a class to define a range,
	e.g. [a-zA-Z0-9_] is the same as %w.

Grouping and lookahead

	(?:...)		simple grouping (no sideeffects)
	(?=...)		positive lookahead,
			true if ... follows, but ... is not consumed
	(?!...)		negative lookahead,
			true if ... does not follow, nothing is consumed

Quantifier

	Quantifiers are given as suffixes.
	The following quantifiers are defined:

* any number of + at least once ? none or one {min,max} "min" to "max" times {min,} at least "min" times {n} exactly "n" times

Quantifiers are eager, they try to go for the longest possible matching string. To find the shortest possible matching string add a "?" to the quantifier.

Note: to add a backslash into an atom the backslash must be written twice ("escape of an atom").

Note: to use a percent mark in an expression the percent mark must be written twice ("escape of the regular expression").

Examples:

	match('(%d+)', 'one123four', L)	=>	L = ['123']
	match('(.*)a', barbara, L)		=>  	L = [barbar]
	match('(.*?)a', barbara, L)		=>  	L = [b]
	match_all('%<(%w+)', ' one  two  three  ', L)	
						=> 	L = [one,two,three]
	split('one two three', L)		=>  	L = [one,two,three]
	chop('  one two  ', L)		=>  	L = 'one two'
	split('%s*:%s*', 'one   :  two  :three', L)
			=>  L  = [one,two,three]

substitute_all('a(.)', barbara, '%1a', L). ==> L = brabraa

e.g. to exchange the position of words

substitute('(%w+) (%w+)', 'one two', '%2 %1', L). ==> L = 'two one'

Bracketed Units

Whatever is bracketed is assigned to a term. Whatever is bracketed by the from left to right i-th opening bracket is assigned to the i-th term. Bracketed terms can be referenced in substitute/4 and substitute_all/4 with %1...%9 in the substitute expression. For match/3 and match_all/3 these are returned in the corresponding sequence. If a bracketed expression was not evaluated, e.g. because it appeared in a disjunction, then its result is ''.

Examples:

match_all('((%w+)|(%s+))', 'a few tokens', L).
	=> L = [[a,a,''],[' ','',' '],[few,few,''],[' ','',' '],[tokens,tokens,'']]

match_all('%<(%w+)', 'a few tokens', L). => L = [[a],[few],[tokens]]

read on...
match/2/3
Tests if the regular expression "regexp" accepts "string". ...
match_all/3
Test if the regular expression "regexp" accepts "string" and return list of matches...
substitute/4
Replace substrings ...
substitute_all/4
All substrings replaced...
chop/2
Cuts off leading and trailing white space....
split/2/3
return tokens of string...
get_line/1/2
MINERVA
ifcomputer logo
f
Expert Services on the Web
Sprache
English
Japanese
Server
USA
Japan
Site Access
Local Index
Local Contents
Site Contents
Site Index
Printer Friendly
For imode
For PDA
Search
document: http://www.ifcomputer.co.jp/MINERVA/Manual/Reference/Predicates/regexp/home_de.html
published 2008/7/21 update 2002/7/4 (c) 1996-2006 IF Computer Japan
IF Computer 5-28-2 Sendagi, Bunkyo-ku Tel +81-3-5814-3352 info@ifcomputer.com
Customer Support Tokyo 113-0022 Japan   http://www.ifcomputer.com
scroll to top managed with ubiCMS