Monday, 27 August 2012

Scanning is slow

I'm always complaining when I see people using a scan or $scan when they don't need to.  Yes, it can be very useful, sometimes it's unavoidable, but here's an example of when it should not be used.

If I asked you to count how many times a substring appeared within a string, you might think about doing it this way...

  temp = list
  total = 0
  scan temp,"ABC"
  while ( $result > 0 )
    total = total+1
    temp = temp[$result+3]
    scan temp,"ABC"

It's perfectly logical code, it looks through the string, scanning for the substring, counting each iteration.  I've written this code myself, a few years back, and didn't think anything of it.  I recently encountered this code that I'd written, it happens to be part of an import process I use quite often.  However, on this particular day, I was importing 10,000 records - far more than usual.  Whilst I was waiting over an hour for this to import, I decided to check the code.

I noticed that I was using a scan and thought for a moment about what alternatives there were.  The first one I thought of seemed a little strange, but I was sure it would work, so I gave it a go.  This is what it was...

  temp = $replace($replace(list,1,"·;","",-1),1,"ABC","·;",-1)
  total = $itemcount(temp)-1

As you can see, I'm first removing an list delimiters (gold-semi-colon characters) from the string, and then replacing the substring with the list delimiter instead.  This now means that I have a Uniface list, and I want to know how many of these delimiters there are in the string.  This easiest way to do this is use $itemcount to count the number of items, and then deduct one, as there's always one more item than there are delimiters.  This worked a lot quicker!

I've reproduced this for testing, using a string with 500 occurrences of the substring, and performing the count 500 times...

  • scan = 00:45.01, 00:43.70, 00:44.31 (just under 45 seconds)
  • list = 00:00.86, 00:00.84, 00:00.84 (under 1 second)

As you can see, quite a staggering difference.  I hope you'll think twice before using scan again!  Obviously the loop and the rebuilding of the string is contributing, but I hope this is still a convincing argument.

Summary: Scanning a string can be essential, but it's a very costly function, so it's well worth thinking about an alternative approach.