Loading data from untyped tables to tables is a very frequently occurring scenario in a Data warehouse application.Normally we load all the data we need in a table which is NVARCHAR or VARCHAR column and then check , row-by-row if all the data satisfies the data type check. If the row passes the type check validation then it is loaded to the typed else it is flagged as INVALID and / or is redirected to another table where we have error out records. So far , so good…But there are few things which are quite painful about this processes.
No 1:- You have to write some kind of LOOP operation for this .
No 2:- You need to write code for each untyped to typed table load.
Even if you use SSIS and use error redirect the INSERT operation becomes row-by-row and no longer is set based which again hits the performance.
If we look at the problem scenario we have
1. A Untyped Table
2. A Typed Table
3. A Flag Column to mark Invalid records
4. A table where all errors go
I am going to try to write a Stored Procedure that does that just. Pass any typed table to it and untyped typed and / or error table , Is_Valid column name (the 4 parameters mentioned above) and it will just find out all the rows that satisfy the type check validation and load then to typed table and will mark the other rows as invalid.
So as long as you pass the typed table name and corresponding untyped table name , where all the column of typed table are in the untyped table then this SP should work .
Lets Create 2 tables Untyped and Typed
CREATE TABLE UNTYPED
(
ID VARCHAR(20)
,TITLE VARCHAR(20)
,NAME VARCHAR(20)
,SALARY VARCHAR(20)
,IS_VALID BIT
)
CREATE TABLE TYPED
(
ID INT
,TITLE VARCHAR(20)
,NAME VARCHAR(20)
,SALARY FLOAT
)
INSERT INTO UNTYPED VALUES('1','Mr','Ash','100.00',NULL)
INSERT INTO UNTYPED VALUES('2','Mr','John','300.00',NULL)
INSERT INTO UNTYPED VALUES('3','Mr','Doe','400.00',NULL)
INSERT INTO UNTYPED VALUES('4','Mr','Bell','500.00',NULL)
INSERT INTO UNTYPED VALUES('5','Mr','Jack','150.00',NULL)
INSERT INTO UNTYPED VALUES('6','Miss','Jane','200.00',NULL)
INSERT INTO UNTYPED VALUES('7','Mrs','Ann','130.00',NULL)
INSERT INTO UNTYPED VALUES('8','12','Judy','Foo',NULL)
INSERT INTO UNTYPED VALUES('Bar','13','Jude','BAR',NULL)
So The Data in the untyped table Now is
So This is this is the SP . Pleas read the comments. You may have to extend it a bit and might like to add more error handling etc.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[usp_load_untyped_to_typed_staging]
@stg_unTyped_table_name NVARCHAR(256)
, @stg_Typed_table_name NVARCHAR(256)
, @is_valid_column_name NVARCHAR(256)
, @i_debugOnly INT = 0
AS
/**
* <Example>
EXEC [dbo].[usp_load_untyped_to_typed_staging]
@stg_typed_table_name = 'typed'
,@stg_unTyped_table_name = 'untyped'
,@is_valid_column_name = 'IS_VALID'
,@i_debugOnly = 1
* <Comment>.
* As of now it does type checking for only INT and FLOAT bit it can be easily extended
* It does not do NOT NULL Validation as of now but it can be extended
* ----------------
*
* Ver Author Date Comments
* --- ------------------- ----------- ------------------------------
* 1 Ashwani Roy 09-04-2010 Load From Untyped Staging to Typed Staging with type checking and error flagging
*/
SET NOCOUNT ON;
DECLARE @sql_script NVARCHAR(MAX)
DECLARE
-- Define user variables.
@ERR_GENERAL INT,
@ERR_OBJ_NULL INT,
@ErrCode INT = 0,
@ErrMsg NVARCHAR(4000),
@StepNbr SMALLINT = 0,
@SQL VARCHAR(MAX);
SET @ERR_GENERAL =99
SET @ERR_OBJ_NULL =0
DECLARE @numeridTypes TABLE (datatype NVARCHAR(255))
INSERT INTO @numeridTypes
SELECT 'INT'
UNION ALL
SELECT 'FLOAT'
BEGIN
/**
* STEP 1
* Set variables.
*/
SET @StepNbr += 1;
SET @ErrMsg = N'ERROR - Stored procedure [' + OBJECT_NAME(@@PROCID) + N'] ';
/**
* STEP 2
* CREATE DYNAMIC SQL FOR INSERT
* MY AUTOMATIC TYPE CHECKING PATTERN FOR SQL
*/
SELECT
COLUMN_NAME
,DATA_TYPE =
CASE WHEN DATA_TYPE = 'VARCHAR' THEN DATA_TYPE+'(' +CAST(CHARACTER_MAXIMUM_LENGTH AS VARCHAR(5)) +')'
ELSE DATA_TYPE
END
,IS_NULLABLE
INTO #temp_columns
FROM INFORMATION_SCHEMA.COLUMNS (NOLOCK) WHERE TABLE_NAME =@stg_typed_table_name
IF(@@ROWCOUNT=0)
BEGIN
SET @ErrCode = @ERR_OBJ_NULL
SET @ErrMsg = 'TYPED TABLE :'+'['+@stg_typed_table_name+']'+N'DOES NOT EXISTS'
GOTO FAIL;
END
--Flag the records that Fail Data type Validation as INVALID--
SET @sql_script = N''
SET @sql_script= @sql_script+ 'UPDATE'+'['+ @stg_unTyped_table_name+']'+CHAR(13)
SET @sql_script= @sql_script+ 'SET'+CHAR(9)+@is_valid_column_name+'=0'+CHAR(13)
SET @sql_script= @sql_script+ 'WHERE '+STUFF((SELECT 'OR '+ '1 = CASE WHEN ISNULL(LEN('+COLUMN_NAME+'),0) != 0
AND ISNUMERIC('+COLUMN_NAME+')= 0 THEN 1 WHEN ISNULL(LEN('+COLUMN_NAME+'),0) = 0 THEN 1 ELSE 0 END '
FROM #temp_columns
WHERE DATA_TYPE IN(SELECT datatype FROM @numeridTypes) FOR XML PATH('')),1, 3, '')
BEGIN TRY
IF @i_debugOnly = 1
BEGIN
PRINT @sql_script
END
ELSE
BEGIN
EXEC SP_EXECUTESQL @sql_script
END
END TRY
BEGIN CATCH
IF(@ErrCode NOT IN (0,1))
SET @ErrCode = @ERR_GENERAL;
SET @ErrMsg = @ErrMsg + N'raised error (' + CAST(ERROR_NUMBER() AS NVARCHAR) + N') in step number ('
+ CAST(@StepNbr AS NVARCHAR) + N') at line number ('
+ CAST(ERROR_LINE() AS NVARCHAR) + N') with message: ' + ERROR_MESSAGE() ;
GOTO FAIL;
END CATCH;
--INSERT THE GOOD ONES IN typed--
--If the Untyped is added with a IS_VALID column then the below SQL can be optimized to JUST a simple SELECT WHERE IS_VALID = 1--
SET @sql_script = N''
SET @sql_script= @sql_script+ 'INSERT INTO'+'['+ @stg_typed_table_name+']'+CHAR(13)
SET @sql_script= @sql_script+ 'SELECT'+CHAR(13)
SET @sql_script= @sql_script+ STUFF((SELECT ', ' + '['+COLUMN_NAME+']' FROM #temp_columns FOR XML PATH('')),1, 2, '')
+CHAR(13)
SET @sql_script= @sql_script+ 'FROM'+CHAR(9)+'['+@stg_unTyped_table_name+']'+CHAR(13)
SET @sql_script= @sql_script+ 'WHERE '+STUFF((SELECT 'AND '+ '1 = CASE WHEN ISNULL(LEN('+COLUMN_NAME+'),0) != 0
AND ISNUMERIC('+COLUMN_NAME+')= 1 THEN 1 WHEN ISNULL(LEN('+COLUMN_NAME+'),0) = 0 THEN 1 ELSE 0 END '
FROM #temp_columns
WHERE DATA_TYPE IN(SELECT datatype FROM @numeridTypes) FOR XML PATH('')),1, 3, '')
BEGIN TRY
IF @i_debugOnly = 1
BEGIN
PRINT @sql_script
END
ELSE
BEGIN
EXEC SP_EXECUTESQL @sql_script
END
END TRY
BEGIN CATCH
IF(@ErrCode NOT IN (0,1))
SET @ErrCode = @ERR_GENERAL;
SET @ErrMsg = @ErrMsg + N'raised error (' + CAST(ERROR_NUMBER() AS NVARCHAR) + N')
in step number (' + CAST(@StepNbr AS NVARCHAR) + N') at line number ('
+ CAST(ERROR_LINE() AS NVARCHAR) + N') with message: ' + ERROR_MESSAGE() ;
GOTO FAIL;
END CATCH;
GOTO SUCCESS;
FAIL:
RAISERROR(@ErrMsg, 16, 1);
RETURN @ErrCode;
SUCCESS:
-- Commit the transaction.
RETURN 0;
DROP TABLE #temp_columns
END;
Now when you run this SP in DEBUG MODE as show in the example in the SP code itself
It will print out the SQL that it generated. Setting debug_only to 0 will run this. The SQL generated is like
UPDATE[untyped]
SET IS_VALID=0
WHERE 1 = CASE WHEN ISNULL(LEN(ID),0) != 0 AND ISNUMERIC(ID)= 0 THEN 1 WHEN ISNULL(LEN(ID),0) = 0
THEN 1 ELSE 0 END OR 1 = CASE WHEN ISNULL(LEN(SALARY),0) != 0 AND ISNUMERIC(SALARY)= 0 THEN 1
WHEN ISNULL(LEN(SALARY),0) = 0 THEN 1 ELSE 0 END
INSERT INTO[typed]
SELECT
[ID], [TITLE], [NAME], [SALARY]
FROM [untyped]
WHERE 1 = CASE WHEN ISNULL(LEN(ID),0) != 0 AND ISNUMERIC(ID)= 1 THEN 1 WHEN ISNULL(LEN(ID),0) = 0 THEN 1
ELSE 0 END
AND 1 = CASE WHEN ISNULL(LEN(SALARY),0) != 0 AND ISNUMERIC(SALARY)= 1 THEN 1 WHEN ISNULL(LEN(SALARY),0) = 0 THEN 1
ELSE 0 END
You can see how using dynamic SQL and Information schemas a nice Type check is generated and another
UPDATE to set the validation column = 0 is also created. Best Part is table names and Validation Column names and ISNUMRIC validation types all are configurable.
Now I will run the query with Debug flag = 0 and you would see that the output is as expected. Out of 9 ROWS 7 Will be loaded in Typed Tabled and 2 Will be marked as invalid in the untyped table.
There you go. Job Done.
Hope you find it useful and will be able to extend to do more validation , checks and Error Redirect type stuff using one SP making you life a little easier.