Tuesday, April 22, 2014

Manage Data Scope

The Issue

In VBA, as in every other language I've used, from a proprietary (i.e. non-IBM) assembler through FORTRAN, Pascal, Progress, PL/1, c, and COBOL there is a facility for defining data variables and constants.  One of the questions is how that data is shared between modules, sub-processes, functions, paragraphs or whatever lingo each one uses for a chunk of code.

 "Data" defined

For purposes of  this post, "data" focuses on the definitional aspect of data more than the value aspect.  Some might call it "meta-data" but once the software is running, it means the vessel for the values as well.  Following are four classifications of data.
  1. Atomic.  Atomic data is what is defined and dealt with at the lowest level.  Any parsing of atomic data destroys it's meaning within it's definitional context. An example of atomic data is BirthDate.  Synonyms for atomic data include "column" (SQL, MS-Excel), "field" (UI), and "attribute" (Data Modeling.
  2. Complex.  This could also be called "molecular data".  Complex data is a set of attributes (atomic data)  It describes a real or conceptual object.  Synonyms include "row" (SQL, MS-Excel), "record", and "entity" (data modeling)
  3. Object.  Atomic or Complex data with Methods (what can be done with the data) and Events (when the Methods trigger).
  4. Collection.  A "collection" is just like it sounds.  It is a conceptual bucket that holds multiple instances of homogeneous atomic data, complex data or objects.  For example, in EXCEL VBA the Sheets in a Workbook are a Collection.

    An array of complex data could also be considered a collection, but not a Collection, which is an object.

Scoping

Data is "scoped", meaning it is valid only within a certain context.  when data is defined it is defined within specific Modules and procedures (i.e. Functions and Subs) or across the whole project, based on how they are declared.   

Data scoping in VBA has three levels:
  • Local data, at the sub or Function level, is considered to be the most desirable from a theoretical standpoint.  The  more tightly scoped, the less likely there will be anomalies, e.g. odd values in unexpected points of the code.
  • Global data, available to the whole project.
  • Regional data, available to the Subs and Functions within a given Module

Local Data

"Local" data is data defined within a specific subroutine (i.e. Sub or Function with a Dim or Const statement and does not use the word "Global" or "Public".    It is unknown outside of where it is declared.

Except for some specific exceptions, I always try to declare data locally then pass those values (including objects) via parameters.  This was something I learned as a college student.  It is also something I learned from my own experience supporting other people's code.  When they violated this principle finding where the data value changed can be excruciating. 

Global Data

Global data definitions and values are known across the application, aka  "Project" in VBA terms.  Think of global data as something that 

In VBA, global data is declared with one of two keywords: "Public" or  "Global" .  

I prefer "Public" for one reason:  It is used universally in VBA (e.g. for Functions, Subs, Data Types...also "Global" cannot be used in Class Modules.

When declaring  global data, whether constants or variables, it is advisable to organize it in one or more Module.  This makes maintenance much simpler because you don' need to search throughout the project to find some specific object.  Also, it makes for more consistent data naming, lower likelihood of redundant definitions

For example, I use variations on the  following two Module names in all of my projects.

  • DD_Global_Constants
  • DD_Global_Variables

Inside these Modules the data is categorized

  • All references to directories, workbooks, worksheets have their own heading.
  • Internals for a worksheet are grouped together, with Names, Column Numbers/Letters, standard header values etc each in their own place in their hierarchy. 

Regional Data

"Regional" data might also be called "Module" data in that it is defined and scoped to a VBA Module. 

VBA Subs and Functions do not stand-alone in the sense that they reside within a Module.  This does not mean a Sub or Function is necessarily dependent on the other Functions and Subs within the same Module nor that they are  isolated from Subs and Functions in other Modules.  For the most part, I simply group Subs and Functions on some theme.

However, when dealing with User Forms, Module-level data definitions are very handy because the data is held between form events, meaning that after one set of user-triggered events completes, an array or intermediate calculation can be accessed when the next series is triggered. 

Communicating Data

As alluded to above, it is highly recommended that as much data as possible be defined locally.  It is also highly recommended that software be made of components  (Functions and Subs) that each has a single purpose.  A pseudo-metric is whether the Sub or Function fits on one page.  An added advantage of the latter is a piece of code can be written and tested, then when it is needed again the Sub or Function can be called.

To achieve these ends, the solution is to pass parameters "down-the-line" and results "up-the-chain".  Parameters are the values needed to perform the operation of the Sub or Function.  Results are how the called Function/Sub reports back what it has done.

Way back when I took a two-quarter sequence we learned to draw Structure Charts as part of the design process.  These showed the calling hierarchy of procedures as well as the data that moved from one to another.  (NOTE: Call it "old school".  Think your smarter than that.  Go ahead, but if you learn to think in this mode it will give you re-useable Functions and Subs instead of just code you hunt down in some Sub, somewhere to copy-paste-troubleshoot.


Sub called "DoSomething"  that needs to find a worksheet then sort it based on the the column headers.  Above is the Structure Diagram, below are code snippets.

First, from DoSomething...
Set wsSortMe := NeededWorksheet(InputBox("Enter the name of the worksheet")
If 
wsSortMe Is Nothing Then 
    PatternSet = False
    SrtPttrn = ColHdrSortPatttern(wsSortMe, PatternSet)
    If PatternSet Then Call SortWS(wsSortMe, SrtPttrn)
End If

...and now the top lines of  the call procedures
Function NeededWorksheet(ByVal wsName as String) as Worksheet

Function ColHdrSortPatttern(ByVal ws as Worksheet, ByRef SuccessFlag as Boolean) as Variant  ' Returns an array of ColNum & Ascending/Descending PLUS a status flag which is False if  ColHdrSortPatttern is empty

Sub SortWS(ByRef ws as Worksheet, ByVal ColumnsAndDirection() as SortPattern, Optional ErrReturn as ErrObject)

Summary

Data scope is managed by 
  • Starting with the premise that tight, i.e. local, scoping provides for higher quality software and procedures with higher levels of re-usability.
  • Limiting global data to the least volatile data
  • Using regional data only as necessary
  • Using parameters instead of global and regional data.