Reputation: 87
I have a table, it has three columns, school_name, school_number, total_Enrollments. However, total_Enrollments was loaded as a VARCHAR column Data Type. Because of this the column has extra spaces and characters.
Scenario: School A has 621 enrollments but the field was loaded with an extra blank space to 631_, the underscore represents the blank field, when I try Summing this up, it will not because it is a VARCHAR so I cast it as INT in order to sum it up (discloser: I was not the one that designed this table this way, the field should have been numeric or INT from the get-go) but it fails.
Solution: I cannot rebuild this table, is there a way through T-SQL that I can change the data type and remove weird characters and extra spaces so that I may use a SUM function to tally the column?
Upvotes: 0
Views: 126
Reputation: 106
1) If you can asure that your required number is on the very left side of the string and you have multiple non-numeric character you can use this:
DROP TABLE IF EXISTS #source;
CREATE TABLE #source(total_Enrollments VARCHAR(100));
INSERT INTO #source VALUES ('101'), ('102_'), ('103 ');
INSERT INTO #source VALUES ('200' + CHAR(9)), ('201' + CHAR(10)), ('202' + CHAR(13));
INSERT INTO #source VALUES (CHAR(9) + '300'), (CHAR(10) + '301'), (CHAR(13) + '303');
SELECT
total_Enrollments =
CASE WHEN PATINDEX('%[^0-9]%', total_Enrollments) = 0
THEN CAST(total_Enrollments AS INT)
ELSE CAST(LEFT(total_Enrollments, PATINDEX('%[^0-9]%', total_Enrollments ) - 1) AS INT) END
, pos = PATINDEX('%[^0-9]%', total_Enrollments)
, orig = total_Enrollments
FROM #source
PATINDEX('%[^0-9]%') use RegEx where %[^0-9]% find the first position of any character that is NOT (expressed by ^) in the range between 0 and 9. Unfortunately the REPLACE-Function can't handle RegEx to replace non-numeric character at multiple positions.
2) If you have to handle CHAR(0) you can do this by replacing it with the REPLACE-Function, below done this in an sub-select.
DROP TABLE IF EXISTS #source;
CREATE TABLE #source(total_Enrollments VARCHAR(100));
INSERT INTO #source VALUES (CHAR(0) + '400' + CHAR(9));
SELECT
total_Enrollments =
CASE WHEN PATINDEX('%[^0-9]%', total_Enrollments) = 0
THEN CAST(total_Enrollments AS INT)
ELSE CAST(LEFT(total_Enrollments, PATINDEX('%[^0-9]%', total_Enrollments ) - 1) AS INT) END
, pos = PATINDEX('%[^0-9]%', total_Enrollments)
, orig = total_Enrollments
FROM (SELECT total_Enrollments = REPLACE(total_Enrollments , CHAR(0), '') FROM #source) AS sub
Please note that REPLACE will find and replace CHAR(10) only if it occures at the beginning of the string. If it occures in any other position you have to use an other collation
SELECT total_Enrollments = REPLACE(total_Enrollments COLLATE SQL_Latin1_General_CP1_CI_AS, CHAR(0), '') FROM #source
Upvotes: 1
Reputation: 41
You must try updating the table
UPDATE <your_table>
SET total_Enrollments = REPLACE(total_Enrollments , CHAR(0), '')
WHERE CHARINDEX(CHAR(0), total_Enrollments ) > 0;
But then you'll also need to fix whatever is putting this bad data into the table
SELECT CONVERT(INT, REPLACE(total_Enrollments , CHAR(0), ''))
FROM <your_table>;
Upvotes: 0