Reputation: 605
Working on a 2016 version excel file where i need to copy only the specific characters which contains _ or - from a column and paste the related data into another column.
If any of the row contains "|" then remove everything after this symbol.
Input Data:
col1 col2
Disconnected on B_NKLA_TUV_0022 As Per
Roll DCN-BVLR-OCR-NKLT | Flapping
VAAP CLR_ NLTK_BKC_ASSO CBR Thing
VCR-DCW- NKL-NKJ-MLP | CBR-XDL-BHG
Disconnected on B_NFA_SUV_0025 As
Steps: First we need to remove all the character after "|" from col1 Then Exctract characters containing "" OR "-", If there is any space between 2 sentance containing "" OR "-", We need to combibne it and paste into col2
Output Data:
col1 col2
Disconnected on B_NKLA_TUV_0022 As Per B_NKLA_TUV_0022
Roll DCN-BVLR-OCR-NKLT | Flapping DCN-BVLR-OCR-NKLT
VAAP CLR_ NLTK_BKC_ASSO CBR Thing CLR_ NLTK_BKC_ASSO
Disconnected on 23 Dec 2022
VCR-DCW- NKL-NKJ-MLP | CBR-XDL-BHG VCR-DCW- NKL-NKJ-MLP
Disconnected on B_NFA_SUV_0025 As B_NFA_SUV_0025
Need Suggestion on it.
Upvotes: 1
Views: 413
Reputation: 76000
Excel 2016 - Formula:
One could try FILTERXML()
and use a simple predicate in the xpath expression to filter the correct substring‡:
Formula in B2:
=IFERROR(SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(LEFT(A1,FIND("|",A1&"|")),"- ","-|"),"_ ","_|")," ","</s><s>")&"</s></t>","//s[translate(.,'-_','')!=.]"),"|"," "),"")
‡ You can be much more restrictive if need be with xpath. This is done through anding more predicates. More information here.
PowerQuery:
Alternatively, you could load your table into PQ. Follow the steps described here to add a custom function based on a regular expression and invoke this using the following regular expression:
^[^|]*?(\\b[A-Z0-9]+(?:[-_] ?[A-Z0-9]+)+)\\b.*$|.
Replace this with a backreference to the 1st capture group $1
. Now you can, as per given link, simply replace error values with a NULL if so desired.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Col1", type text}}),
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "fnRegexExtr", each fnRegexExtr([Col1], "^[^|]*?(\\b[A-Z0-9]+(?:[-_] ?[A-Z0-9]+)+)\\b.*$|.", "$1")),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Invoked Custom Function", {{"fnRegexExtr", null}})
in
#"Replaced Errors"
VBA:
Another alternative is creating your own custom User Defined Function (UDF) through VBA, and invoke this like a formula on your sheet. For example:
Public Function RegexReplace(s As String, pat As String, rep As String, Optional cse As Boolean = False) As String
Static RE As Object: If RE Is Nothing Then Set RE = CreateObject("vbscript.regexp")
RE.Pattern = pat
RE.IgnoreCase = cse
RegexReplace = RE.Replace(s, rep)
End Function
Now invoke this through using a slightly different pattern:
=RegexReplace(A2,"^[^|]*?(\b[A-Z0-9]+(?:[-_] ?[A-Z0-9]+)+)\b.*$|.+","$1")
Upvotes: 2
Reputation: 37155
Try the following formula-
=TEXTJOIN(" ",1,
IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(TEXTBEFORE(A2,"|",,,,A2)," ","</s><s>")&"</s></t>",
"//s[contains(., '_') or contains(., '-')]"),""))
Upvotes: 3