Post

A Generic Untyped To Typed Table Load which does automatic Type Checking and error redirection

In SQL Server 2008 on April 11, 2010 by ashwaniroy

Loading data from untyped tables to tables is a very frequently occurring scenario in a Data warehouse application.Normally we load all the data we need in a table which is NVARCHAR or VARCHAR column and then check , row-by-row if all the data satisfies the data type check. If the row passes the type check validation then it is loaded to the typed else it is flagged as INVALID and / or is redirected to another table where we have error out records. So far , so good…But there are few things which are quite painful about this processes.

No 1:- You have to write some kind of LOOP operation for this .

No 2:- You need to write code for each untyped to typed table load.

Even if you use SSIS and use error redirect the INSERT operation becomes row-by-row and no longer is set based which again hits the performance.

If we look at the problem scenario we have

1. A Untyped Table

2. A Typed Table

3. A Flag Column to mark Invalid records

4. A table where all errors go

I am going to try to write a Stored Procedure that does that just. Pass any typed table to it and untyped typed and / or error table , Is_Valid column name (the 4 parameters mentioned above) and it will just find out all the rows that satisfy the type check validation and load then to typed table and will mark the other rows as invalid.

So as long as you pass the typed table name and corresponding untyped table name , where all the column of typed table are in the untyped table then this SP should work .

Lets Create 2 tables Untyped and Typed

CREATE  TABLE UNTYPED
(
 ID      VARCHAR(20)
,TITLE   VARCHAR(20)
,NAME    VARCHAR(20)
,SALARY  VARCHAR(20)
,IS_VALID BIT
) 

CREATE  TABLE TYPED
(
 ID      INT
,TITLE   VARCHAR(20)
,NAME    VARCHAR(20)
,SALARY  FLOAT
) 

INSERT INTO UNTYPED VALUES('1','Mr','Ash','100.00',NULL)
INSERT INTO UNTYPED VALUES('2','Mr','John','300.00',NULL)
INSERT INTO UNTYPED VALUES('3','Mr','Doe','400.00',NULL)
INSERT INTO UNTYPED VALUES('4','Mr','Bell','500.00',NULL)
INSERT INTO UNTYPED VALUES('5','Mr','Jack','150.00',NULL)
INSERT INTO UNTYPED VALUES('6','Miss','Jane','200.00',NULL)
INSERT INTO UNTYPED VALUES('7','Mrs','Ann','130.00',NULL)
INSERT INTO UNTYPED VALUES('8','12','Judy','Foo',NULL)
INSERT INTO UNTYPED VALUES('Bar','13','Jude','BAR',NULL) 

So The Data in the untyped table Now is

clip_image001

 

So This is this is the SP . Pleas read the comments. You may have to extend it a bit and might like to add more error handling etc.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[usp_load_untyped_to_typed_staging]
                   @stg_unTyped_table_name NVARCHAR(256)
                 , @stg_Typed_table_name NVARCHAR(256)
                 , @is_valid_column_name    NVARCHAR(256)
                 , @i_debugOnly INT = 0
AS
/**
 * <Example>
   EXEC  [dbo].[usp_load_untyped_to_typed_staging]
                 @stg_typed_table_name     =  'typed'
                ,@stg_unTyped_table_name   =  'untyped'
                ,@is_valid_column_name     =  'IS_VALID'
                ,@i_debugOnly               =   1

 * <Comment>.
 * As of now it does type checking for only INT and FLOAT bit it can be easily extended
 * It does not do NOT NULL Validation as of now  but it can be extended
 * ----------------
 *
 * Ver    Author            Date        Comments
 * ---    -------------------    -----------    ------------------------------
 *   1    Ashwani Roy     09-04-2010   Load From Untyped Staging to Typed Staging with type checking and error flagging
 */
SET NOCOUNT ON;

DECLARE    @sql_script NVARCHAR(MAX)

DECLARE
        -- Define user variables.
        @ERR_GENERAL          INT,
        @ERR_OBJ_NULL          INT,
        @ErrCode              INT = 0,
        @ErrMsg               NVARCHAR(4000),
        @StepNbr              SMALLINT = 0,
        @SQL                  VARCHAR(MAX);
SET     @ERR_GENERAL          =99
SET     @ERR_OBJ_NULL         =0

DECLARE @numeridTypes          TABLE  (datatype NVARCHAR(255))
INSERT INTO @numeridTypes
SELECT 'INT'
UNION ALL
SELECT 'FLOAT'

BEGIN
 /**
  * STEP 1
  * Set variables.
  */
  SET @StepNbr += 1;
  SET @ErrMsg = N'ERROR - Stored procedure [' + OBJECT_NAME(@@PROCID) + N'] ';

 /**
  * STEP 2
  * CREATE DYNAMIC SQL FOR INSERT
  * MY AUTOMATIC TYPE CHECKING PATTERN FOR SQL
  */
    SELECT
         COLUMN_NAME
        ,DATA_TYPE =
                CASE WHEN DATA_TYPE = 'VARCHAR' THEN DATA_TYPE+'(' +CAST(CHARACTER_MAXIMUM_LENGTH AS VARCHAR(5)) +')'
                ELSE DATA_TYPE
        END
        ,IS_NULLABLE
    INTO #temp_columns
    FROM INFORMATION_SCHEMA.COLUMNS (NOLOCK) WHERE TABLE_NAME  =@stg_typed_table_name 

IF(@@ROWCOUNT=0)
BEGIN
    SET @ErrCode = @ERR_OBJ_NULL
    SET @ErrMsg = 'TYPED TABLE :'+'['+@stg_typed_table_name+']'+N'DOES NOT EXISTS'
    GOTO FAIL;
END

    --Flag the records that Fail Data type Validation as INVALID--
    SET @sql_script = N''
    SET @sql_script= @sql_script+ 'UPDATE'+'['+ @stg_unTyped_table_name+']'+CHAR(13)
    SET @sql_script= @sql_script+ 'SET'+CHAR(9)+@is_valid_column_name+'=0'+CHAR(13)
    SET @sql_script= @sql_script+ 'WHERE '+STUFF((SELECT 'OR '+ '1 = CASE WHEN ISNULL(LEN('+COLUMN_NAME+'),0) != 0
    AND ISNUMERIC('+COLUMN_NAME+')= 0 THEN 1 WHEN ISNULL(LEN('+COLUMN_NAME+'),0) = 0    THEN 1    ELSE 0 END  '
    FROM #temp_columns
    WHERE DATA_TYPE IN(SELECT datatype FROM @numeridTypes) FOR XML PATH('')),1, 3, '')

BEGIN TRY
    IF @i_debugOnly = 1
    BEGIN
        PRINT @sql_script
    END
    ELSE
    BEGIN
        EXEC SP_EXECUTESQL @sql_script
    END
END TRY
BEGIN CATCH
    IF(@ErrCode NOT IN (0,1))
        SET @ErrCode = @ERR_GENERAL;
    SET @ErrMsg = @ErrMsg + N'raised error (' + CAST(ERROR_NUMBER() AS NVARCHAR) + N') in step number ('
    + CAST(@StepNbr AS NVARCHAR) + N') at line number ('
    + CAST(ERROR_LINE() AS NVARCHAR) + N') with message: ' + ERROR_MESSAGE() ;
    GOTO FAIL;
END CATCH;

    --INSERT THE GOOD ONES IN typed--
    --If the Untyped is added with a IS_VALID column then the below SQL can be optimized to JUST a simple SELECT WHERE IS_VALID = 1--
    SET @sql_script = N''
    SET @sql_script= @sql_script+ 'INSERT INTO'+'['+ @stg_typed_table_name+']'+CHAR(13)
    SET @sql_script= @sql_script+ 'SELECT'+CHAR(13)
    SET @sql_script= @sql_script+ STUFF((SELECT ', ' + '['+COLUMN_NAME+']' FROM #temp_columns FOR XML PATH('')),1, 2, '')
                     +CHAR(13)
    SET @sql_script= @sql_script+ 'FROM'+CHAR(9)+'['+@stg_unTyped_table_name+']'+CHAR(13)
    SET @sql_script= @sql_script+ 'WHERE '+STUFF((SELECT 'AND '+ '1 = CASE WHEN ISNULL(LEN('+COLUMN_NAME+'),0) != 0
                     AND ISNUMERIC('+COLUMN_NAME+')= 1 THEN 1 WHEN ISNULL(LEN('+COLUMN_NAME+'),0) = 0    THEN 1    ELSE 0 END  '
                     FROM #temp_columns
                     WHERE DATA_TYPE IN(SELECT datatype FROM @numeridTypes) FOR XML PATH('')),1, 3, '')

BEGIN TRY
    IF @i_debugOnly = 1
    BEGIN
        PRINT @sql_script
    END
    ELSE
    BEGIN
        EXEC SP_EXECUTESQL @sql_script
    END
END TRY
BEGIN CATCH
    IF(@ErrCode NOT IN (0,1))
        SET @ErrCode = @ERR_GENERAL;
    SET @ErrMsg = @ErrMsg + N'raised error (' + CAST(ERROR_NUMBER() AS NVARCHAR) + N')
    in step number (' + CAST(@StepNbr AS NVARCHAR) + N') at line number ('
                 + CAST(ERROR_LINE() AS NVARCHAR) + N') with message: ' + ERROR_MESSAGE() ;
    GOTO FAIL;
END CATCH;

GOTO SUCCESS;

FAIL:
  RAISERROR(@ErrMsg, 16, 1);
  RETURN @ErrCode;

SUCCESS:
  -- Commit the transaction.
  RETURN 0;

DROP TABLE #temp_columns
END;

Now when you run this SP in DEBUG MODE as show in the example in the SP code itself

It will print out the SQL that it generated. Setting debug_only to 0 will run this. The SQL generated is like

UPDATE[untyped]
SET    IS_VALID=0
WHERE 1 = CASE WHEN ISNULL(LEN(ID),0) != 0 AND ISNUMERIC(ID)= 0 THEN 1 WHEN ISNULL(LEN(ID),0) = 0
THEN 1    ELSE 0 END  OR 1 = CASE WHEN ISNULL(LEN(SALARY),0) != 0 AND ISNUMERIC(SALARY)= 0 THEN 1
WHEN ISNULL(LEN(SALARY),0) = 0    THEN 1    ELSE 0 END  

INSERT INTO[typed]
SELECT
[ID], [TITLE], [NAME], [SALARY]
FROM    [untyped]
WHERE  1 = CASE WHEN ISNULL(LEN(ID),0) != 0 AND ISNUMERIC(ID)= 1 THEN 1 WHEN ISNULL(LEN(ID),0) = 0    THEN 1
ELSE 0 END
AND 1 = CASE WHEN ISNULL(LEN(SALARY),0) != 0 AND ISNUMERIC(SALARY)= 1 THEN 1 WHEN ISNULL(LEN(SALARY),0) = 0    THEN 1
ELSE 0 END

You can see how using dynamic SQL and Information schemas a nice Type check is generated and another

UPDATE to set the validation column = 0 is also created. Best Part is table names and Validation Column names and ISNUMRIC validation types all are configurable.

Now I will run the query with Debug flag = 0 and you would see that the output is as expected. Out of 9 ROWS 7 Will be loaded in Typed Tabled and 2 Will be marked as invalid in the untyped table.

clip_image002 

There you go. Job Done.

Hope you find it useful and will be able to extend to do more validation , checks and Error Redirect type stuff using one SP making you life a little easier.

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.