Monday 24 November 2014

Syntax strings versus literal strings

There are a number of very important differences between syntax strings and literal strings, which I will attempt to highlight for you here.

Firstly, the Uniface definition of a syntax string...
Uniface enables you to determine if the data in a string value matches a desired pattern using syntax strings.  A syntax string is a group of characters and syntax codes enclosed in single quotation marks (').
So that's the first big difference right there, literal strings are in double quotes (") and syntax strings are in single quotes (')...

  literalString = "Literal string"
  syntaxString = 'Syntax string'

A syntax string is used for pattern matching, but they are much more simplistic than the regular expressions that you might be used to in other languages.  The syntax codes are quite straight forward...

  • # - one digit (0-9)
  • & - one letter (A-Z, a-z)
  • @ - one letter, digit or underscore (A-Z, a-z, 0-9, _)
  • ~& - one extended letter
  • ~@ - one extended letter, digit or underscore
  • ? - one ASCII character
  • A-Z - that letter, in uppercase
  • a-z - that letter, in uppercase or lowercase

On top of these are a few other syntax codes of note...

  • If you want to search for the literal version of a syntax code, eg. you want to search for the hash character (#) not a digit (0-9), then you can escape the syntax code using % before it.  Therefore, to search for a hash character it would be '%#' and to search for a percentage character it would be '%%'.
  • If you want to search for a unknown number of syntax code, eg. you want to search for 2 or more digits, then you can use * after it.  Therefore, to search for 2 or more digits it would be '###*' - in this case the first two hash characters represent one digit each, and the third hash character is part of the the syntax code '#*', which means zero or more (0-n) digits.
  • If you want part of the pattern to be optional (to match the pattern or blank) then you can put rounded brackets around it, using ( and ).  For example, '##(#)' would match with 2 or 3 digits (but not more than 3).
  • Any other character is treated literally as that character.  

Uniface also gives a handy function that can be used to convert a literal string into a syntax string, sensibly named $syntax...

  syntaxString = $syntax("Literal string")

This can be especially useful if you're storing the pattern in the database, or some other configuration.

Here are a few examples from the Uniface manuals...

Proc with Syntax String
Result
if ('#' = "123")
if ('#*' = "123")
FALSE
TRUE
if ('#*' = vValue)
if ('#*' = "%%vValue")
TRUE
TRUE
if ('&###' = "1234")
if ('@###' = "1234")
FALSE
TRUE
if ('?' = "A")
if ('??' = "A")
if ('??*' = "A")
if ('?' = "ABC")
if ('??*' = "ABC")
TRUE
FALSE
TRUE
FALSE
TRUE
if ('(#(-))&&&' = "ABC")
if ('(#(-))&&&' = "1ABC")
if ('(#(-))&&&' = "1-ABC")
if ('(#(-))&&&' = "12ABC")
TRUE
TRUE
TRUE
FALSE


I've had (occasionally heated!) discussions with developers before, when they have said that...

  if ( myVar = "Y" )

...and...

  if ( myVar = 'Y' )

...are interchangable.  

Yes, I can see that the pattern 'Y' only matches with the string "Y" and nothing else, but they are entirely different ideas, and should not be used interchangeably.  If you want to only match with "Y", then use the literal string "Y", because that's what you mean.  

In my experience, developers who say that these are interchangable are developers who don't understand what the differences are.

Summary: Syntax strings are very useful for pattern matching, but very different than literal strings.  Know the difference, and know when to use which.

No comments:

Post a Comment