Regular Expressions

A regular expression is a powerful way of specifying a pattern for a complex search. The simplest regular expression is one that has no special characters in it. For example, the regular expression canine matches canine and nothing else.

Non-trivial regular expressions use certain special constructs so that they can match more than one string. For example, the regular expression canine|feline matches either the string canine or the string feline.

Data Major uses these regular expression in most places you can enter details. The full list is given below, only some will have actual examples.

Regular Expressions

^

Match the beginning of a string.

^Canine

The text STARTS with the word canine

$

Match the end of a string.

Terrier$

The text ENDS with the phrase terrier

de|abc

Match either of the sequences de or abc.

Canine|Feline

Matches EITHER the text Canine OR Feline

.

Match any character

a*

Match any sequence of zero or more a characters.

a+

Match any sequence of one or more a characters.

a?

Match either zero or one a character.

(abc)*

Match zero or more instances of the sequence abc.

[a-dX]

[^a-dX]

Matches any character that is (or is not, if ^ is used) either a, b, c, d or X. A - character between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit.

To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. For example, to match the string 1+2 that contains the special + character, 1\\+2

 

Ad-Hoc - Full text Searching

The full text search can handle 'Boolean Searches' - this just means you have much more control over what is searched for when compared with the basic 'Clinical Text Contain' option.

The basic Full Text Search looks for ANY occurrence of the words you enter and will return a 'score' based on what it found. e.g. searching for 'Synulox tabs' look for lines where either word is present. The higher the 'score' the better the match.

The other option is the 'Boolean' mode - the system will switch into this mode automatically if any of the following characters are entered ~()<>-+*"  in the search criteria.

Some examples when in Boolean mode

+Synulox -tabs

Look for lines where Synulox IS present but tabs is NOT

+Synulox +tabs

Both words MUST be present but not necessary adjacent to each other.

"Synulox Tabs"

The text 'Synulox space Tabs' must be found

Synulo*

Any lines with the word starting Synulo something. The * must appear AFTER the text.

+Synulox +(>Inject <tabs)

Find lines that contain the words 'Synulox' and 'Inject’, or `Tabs’ and 'Synulox' (in any order), but rank `Inject Synulox’ higher than 'Synulox Tabs'

The other Boolean option are listed here for reference, no examples will be given.

()

Parentheses are used to group words into sub expressions. Parenthesized groups can be nested.

~

A leading tilde acts as a negation operator, causing the word's contribution to the row relevance to be negative. It's useful for marking noise words. A row that contains such a word will be rated lower than others, but will not be excluded altogether, as it would be with the - operator.
   
   





Here is a more detailed list for reference.

Operator Type Examples Description
Literal Characters
Match a character exactly
a A y 6 % @ Letters, digits and many special
characters match exactly
\$ \^ \+ \\ \? Precede other special characters
with a \ to cancel their regex special meaning
\n \t \r Literal new line, tab, return
\cJ \cG Control codes
\xa3 Hex codes for any character
Anchors and assertions ^ Field starts with
$ Field ends with
[[:<:]] Word starts with
[[:>:]] Word ends with
Character groups
any 1 character from the group
[aAeEiou] any character listed from [ to ]
[^aAeEiou] any character except aAeEio or u
[a-fA-F0-9] any hex character (0 to 9 or a to f)
. any character at all
[[:space:]] any space character (space \n \r or \t)
[[:alnum:]] any alphanumeric character (letter or digit)
Counts
apply to previous element
+ 1 or more ("some")
* 0 or more ("perhaps some")
? 0 or 1 ("perhaps a")
{4} exactly 4
{4,} 4 or more
{4,8} between 4 and 8
Add a ? after any count to turn it sparse (match as few as possible) rather than have it default to greedy
Alternation | either, or
Grouping ( ) group for count and save to variable