Reputation: 19
I need to finalize a query. The query returns a column which contains values like "P100+P200" or "SUMME(P400:P1200)".
In the end, the result should be:
Column A | Column B | Column C |
---|---|---|
P100 | + | P200 |
P400 | : | P1200 |
Solved to extract column A and column B.
I used for the first two steps this code:
MAX (SUBSTRING(t3.formel, PATINDEX('%[A-Z][0-9]%', t3.formel), PATINDEX('%[+:-]%', SUBSTRING(t3.formel, PATINDEX('%[A-Z][0-9]%', t3.formel), LEN(t3.formel))) - 1)) "Formelteil 1",
MAX (SUBSTRING(t3.formel, PATINDEX('%[+:.-]%', t3.formel), 1) ) AS Sonderzeichen
But guess I'm going to be blind about the solution for the third step.
Upvotes: 0
Views: 148
Reputation: 131374
T-SQL isn't a text manipulation language and doesn't even have regular expressions. It's a lot easier to do this task in a client language, using a regular expression like ([A-Z\d]+)([+:.-])([A-Z\d]+)
to capture the three parts.
In the comments you mention the data is used in Power BI. You can use a Python Transformation in the Query editor to apply a regular expression to the data using Pandas' str.exact and automatically extract the parts into columns.
The Power BI step script is essentially a one-liner
import pandas as pd
pattern=r"([A-Z\d]+)([+:.-])([A-Z\d]+)"
dataset[['a','b','c']]=dataset['formel'].str.extract(pattern)
str.extract applies the regular expression to all the values of the formel
column (Series) and extracts each capture group into a separate column. dataset[['a','b','c']]=
stores those columns in the original dataset using the names a
, b
and c
.
You can easily test Python scripts in the command line or a Jupyter Notebook in VS Code.
The following script, in either Python or VS Code :
import pandas as pd
dataset=pd.DataFrame({'formel':['P100+P400','SUMME(P200:P300)']})
pattern=r"([A-Z\d]+)([+:.-])([A-Z\d]+)"
dataset[['a','b','c']]=dataset['formel'].str.extract(pattern)
dataset
Prints
formel a b c
0 P100+P400 P100 + P400
1 SUMME(P200:P300) P200 : P300
Upvotes: -2
Reputation: 2853
As mentioned in the comments, this is not really a job for SQL Server.
When asking questions like this it's helpful to provide example DDL/DML:
DECLARE @Table TABLE (formel NVARCHAR(100));
INSERT INTO @Table (formel) VALUES
('P100+P200'), ('G100/G200'), ('a100*z200'), ('P1005-P2005'), ('SUMME(P400:P1200)');
You're two thirds of the way there. Since we only seem to need to worry about one additional character, we can simply use the position of the operator + 1 to find the start of the last string and use an arbitrary number higher than the remaining characters, and then replace it with nothing:
SELECT t3.formel,
SUBSTRING(t3.formel, PATINDEX('%[A-Z|a-z][0-9]%', t3.formel),PATINDEX('%[-|*|/|+|:]%', t3.formel)-PATINDEX('%[A-Z|a-z][0-9]%', t3.formel)) AS a,
SUBSTRING(t3.formel, PATINDEX('%[-*/+:]%', t3.formel), 1) AS b,
REPLACE(SUBSTRING(t3.formel, PATINDEX('%[-*/+:]%', t3.formel)+1, LEN(t3.formel)),')','') AS c
FROM @Table t3;
formel | a | b | c |
---|---|---|---|
P100+P200 | P100 | + | P200 |
G100/G200 | G100 | / | G200 |
a100*z200 | a100 | * | z200 |
P1005-P2005 | P1005 | - | P2005 |
SUMME(P400:P1200) | P400 | : | P1200 |
Upvotes: 0