Regular Expressions
A regular expression is a powerful way of specifying a pattern for a complex search. The simplest regular expression is one that has no special characters in it. For example, the regular expression canine matches canine and nothing else.
Non-trivial regular expressions use certain special constructs so that they can match more than one string. For example, the regular expression canine|feline matches either the string canine or the string feline.
Data Major uses these regular expression in most places you can enter details. The full list is given below, only some will have actual examples.
|
Regular
Expressions |
|||
|
^ |
Match the beginning of a string. |
^Canine |
The text STARTS with the word canine |
|
$ |
Match the end of a string. |
Terrier$ |
The text ENDS with the phrase terrier |
|
de|abc |
Match either of the sequences de or abc. |
Canine|Feline |
Matches EITHER the text Canine OR Feline |
|
. |
Match any character |
||
|
a* |
Match any sequence of zero or more a characters. |
||
|
a+ |
Match any sequence of one or more a characters. |
||
|
a? |
Match either zero or one a character. |
||
|
(abc)* |
Match zero or more instances of the sequence abc. |
||
|
[a-dX] [^a-dX] |
Matches any character that is (or is not, if ^ is used) either a, b, c, d or X. A - character between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. |
||
|
To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. For example, to match the string 1+2 that contains the special + character, 1\\+2 |
|||
|
Ad-Hoc -
Full text Searching |
|||||||||||||
|
The full text search can handle
'Boolean Searches' - this just means you have much more control over
what is searched for when compared with the basic 'Clinical Text
Contain' option. The basic Full Text Search
looks for ANY occurrence of the words you enter and will return a
'score' based on what it found. e.g. searching for 'Synulox tabs' look for lines
where either word is
present. The higher the 'score' the better the match. The other option is the 'Boolean' mode - the system will
switch into this mode automatically if any of the following characters
are entered ~()<>-+*" in the search criteria.
|
|||||||||||||
|
The other Boolean option are listed here for reference, no examples will be given. |
|||||||||||||
|
() |
Parentheses are used to group words into sub expressions. Parenthesized groups can be nested. |
||||||||||||
|
~ |
A leading
tilde acts as a negation operator, causing the word's contribution to
the row relevance to be negative. It's useful for marking noise words.
A row that contains such a word will be rated lower than others, but
will not be excluded altogether, as it would be with the - operator. |
||||||||||||
| Operator Type | Examples | Description |
|---|---|---|
| Literal Characters Match a character exactly |
a A y 6 % @ | Letters, digits and many special characters match exactly |
| \$ \^ \+ \\ \? | Precede other special characters with a \ to cancel their regex special meaning |
|
| \n \t \r | Literal new line, tab, return | |
| \cJ \cG | Control codes | |
| \xa3 | Hex codes for any character | |
| Anchors and assertions | ^ | Field starts with |
| $ | Field ends with | |
| [[:<:]] | Word starts with | |
| [[:>:]] | Word ends with | |
| Character groups any 1 character from the group |
[aAeEiou] | any character listed from [ to ] |
| [^aAeEiou] | any character except aAeEio or u | |
| [a-fA-F0-9] | any hex character (0 to 9 or a to f) | |
| . | any character at all | |
| [[:space:]] | any space character (space \n \r or \t) | |
| [[:alnum:]] | any alphanumeric character (letter or digit) | |
| Counts apply to previous element |
+ | 1 or more ("some") |
| * | 0 or more ("perhaps some") | |
| ? | 0 or 1 ("perhaps a") | |
| {4} | exactly 4 | |
| {4,} | 4 or more | |
| {4,8} | between 4 and 8 | |
| Add a ? after any count to turn it sparse (match as few as possible) rather than have it default to greedy | ||
| Alternation | | | either, or |
| Grouping | ( ) | group for count and save to variable |