Wednesday, March 19, 2014

SUMIFS as substitute/extension for VLOOKUP

NOTE:  The following is not applicable to EXCEL releases prior to 2007

The Problem

VLOOKUP and HLOOKUP are a wonderful database-like search tools, returning a result based on a search value by looking into a table.
    VLOOKUP(lookup_value, table_array , col_index_num, [range_lookup])

However, there are some limitations.

First Column is the key, other columns/rows follow

VLOOKUP and HLOOKUP were built with the objective of finding data in a simple table based on a key value.  It seems like MicroSoft took an example from a 1980s database book, where the key value is always shown in the leftmost column and the dependent values are to right.  

This is not a fault, but it is a limitation. It works well when the data has been normalized <wikipedia> and the primary key is atomic.  It fails when trying to do a look-up in many worksheets because 
  • There is no atomic primary key, e.g. the worksheet may have columns (or rows) for Last Name and First Name of employees but not one where it is "<LastName>, <FirstName>"
  • The worksheet is designed for presenting information in an order that makes sense to the users, not in an order that makes sense to a computer function.
  • When there is a master worksheet, some look-ups may want to find the manager of a department and in others the department of the manager. If the department is to the left of (or above) the manager the second look-up cannot be done.

Work Around 1: Redundant columns/rows

NOTE:  This is written using VLOOKUP because that is more common.  It can be adapted to HLOOKUP by considering the Rows, and where "above" is substituted for "to the left", "below" for "to the right". 

For the last bullet above, assuming the Manager is in Column K and the department name or number is in Column L, both look-ups will be possible if one of the following two options are taken
  • To the left of the Manager column insert a column, (NOTE: This puts Dept Name/Number into Column M). In the new column put a formula that make the new columns' values the same as column M, e.g. put "=M2" in the second row then copy-down for the rest of the rows.
  • To the right of the Dept Name/Number column insert a column, (NOTE: This does not change the columns in question.). In the new column put a formula that make the new columns' values the same as column K, e.g. put "=K2" in the second row then copy-down for the rest of the rows.
Once done, the new column should be hidden.

Work Around 2: Normalize

It is possible to normalize the data, that is to create multiple tables that are in 3rd (or even 5th) normal form, where redundancy is limited.  

Normalizing improves maintainability, e.g. if the department name is changed it only needs to be changes in the "Department table" and not in every row where a person works for that department.  

Also, rather than typing/pasting a department name, the worksheets can be set-up to use a drop-down. Unfortunately, since the selected drop-down values do not automatically update to the new values, this means an additional bit of code is needed when the values in the referenced tables are changed

On the other hand, normalized data can be confusing to users.  People are used to doing things a certain way, e.g. a Find/Replace of one department name for another, and may find the new methods cumbersome.

Work Around 3: Concatenated Keys

As mentioned in the example about employee names, sometimes a key is needed that is a combination of multiple columns.  Another example would be departmental funding that covers multiple years or months.

In these cases a column can be inserted to the left of the results column.  These can be information that the users might find good to see, e.g. the FirstName-comma-space-Lastname.  They can also seem redundant and sometimes just ugly, e.g. a departmentnumber-year-month concatentation.

In the latter cases, the columns can be hidden.

A New Solution

Importance of CountIfs and SumIfs to V/HLookup

There are two differences in the COUNTIFS and SUMIFS  functions that make them useful in implementing VLookup and HLookup.
  • Column/row order is not important.  This was true with COUNTIF and SUMIF.
  • COUNTIFS and SUMIFS can search multiple columns.
By using the results of COUNTIFS or SUMIFS ca

Setup

Add the Index Column

This solution calls for 1 new column.  Column A is the indexing column for the VLOOKUP.  Insert a column at A then put "=ROW()" in column A for all rows in the table.  This single, simple column replaces all of the alternative key columns that might crop up in a large spreadsheet

Name the columns

Putting Names on the columns has some advantages.  First, it makes the formulas more readable.  Second, if the columns are shifted or rearranged the formulas continue to work.

In this example I have a worksheet (People)  with the following Names.  
  • Index =People!$A:$A
  • LastName =People!$B:$B
  • FirstName =People!$C:$C
  • HowMuch =People!$D:$D
  • Folks=People!$A:$D
The only necessity is that Index is the leftmost column within Folks, and Folks be defined as including column A.  Allowing Folks and Index to be anywhere else would complicate the formulas, perhaps drastically.

The target worksheet consists of three columns:  
  • A - Last name
  • B - First name
  • C - VLOOKUP (How Much)

The Lookup

How it works

In short, the LOOKUP uses the value in the Index column to return what is in the HowMuch Column.   The value for the Index to use is the result of a SUMIF function.

From the inside out, the SUMIFS returns the row number and the LOOKUP uses the row number to find the right row then look across to the right column.

Building the VLOOKUP

  • Starting point:
    VLOOKUP(lookup_value, table_Array , col_index_num, [range_lookup])
  •  [range_lookup] = FALSE. False is selected because we want only exact matches.
    VLOOKUP(lookup_value, table_Array , col_index_num, FALSE)
  • Table_Array = Folks. Folks is the whole table, from Index in column A to the rightmost column, in this case column D.
    VLOOKUP(lookup_value, Folks,Col_index_num , FALSE)
  • Col_index_num = COLUMN(HowMuch).  Column(HowMuch) returns the column number for the How Much column, which is the offset needed by the Lookup function.  This is the primary reason for the limitations on table and Index placements.
    VLOOKUP(lookup_value, Folks,COLUMN(HowMuch), FALSE)
  • lookup_value = SUMIFS(Index,FirstName,B1,LastName,A1).  By using SUMIFS the sum is the value in Index because only one row is found. (See below for handling this error. This assumes the combinations of FirstName and LastName are unique for each row.
    VLOOKUP(SUMIFS(Index,FirstName,B1,LastName,A1), Folks,COLUMN(HowMuch), FALSE)

The Add-ons: Error handling

This solution depends on (A) unique rows in the Folks table and (B) an entry in the Folks table for every First/Last combination feeding the SUMIFS.  The former would result in the wrong value for lookup_value because the SUMIF would have Indexes for multiple rows and add them together and point the VLOOKUP to the wrong row.  The latter would result in a 0, which would result in an "#N/A" error from VLOOKUP. 

Both can be handled with the COUNTIFS function
  • The First Name and Last Name used in the SUMIF may not match any  rows in Folks.  To identify these, test with a COUNTIF that checks if there are no (zero) matching rows, e.g.  IF(COUNTIFS(FirstName,A2,LastName,B2)=0,"Not one of ours!"
  • The Folks table may have two rows with the same first and last name.  To provide a message when this is encountered, use COUNTIFS to see if there are more than one IF(COUNTIFS(FirstName,A2,LastName,B2)>1,"Two people, same name"

Altogether


=IF(COUNTIFS(FirstName,A2,LastName,B2)=0,"Not one of ours!",
IF(COUNTIFS(FirstName,A2,LastName,B2)>1,"Two people, same name",
VLOOKUP(SUMIFS(Index,FirstName,B1,LastName,A1)))