Reputation: 77
=IFERROR(MID(I46,FIND("-",I46,8)-2,5),"")
I have a vast string of information copied into one cell from another program. There are several parts of this string that mean absolutely nothing for my current purpose. I want to extract the relevant information, it is always formatted the same way, but not always in the same position within the string. I am working with the above code which returns what I'm looking for, but also other variations when "-" is found, which can be often.
The information I want to extract will always be LetterLetter-NumberNumber, ex: AA-01 AA-02 AB-01, and so on, always 5 characters.
How can I extract just this?
Then, in another row if need be, I want to remove duplicate instances (there will nearly always be duplicates).
What I get now is;
HA-03
HA-03
T - S
Y - R
HA-03
HA-03
HA-03
HA-03
Y - R
HA-06
HA-06
R - S
HA-07
HA-09
HA-09
HA-09
First, I'd like to just get;
HA-03
HA-03
HA-03
HA-03
HA-03
HA-03
HA-06
HA-06
HA-07
HA-09
HA-09
HA-09
Then convert that into;
HA-03
HA-06
HA-07
HA-09
If there is a way to skip the middle-man, I'm all ears =)
Thank You.
Upvotes: 0
Views: 500
Reputation: 56004
If you want to avoid VBA, then try this:
Column A
asdfHA-03asdfasdf
HA-03sadfsa
asdfT - S
Y - Rasdfsad
asdfHA-03adf
asdHA-04
asdfsadf
Then on use this formula:
=IF(ISERROR(FIND(" ",IFERROR(MID(A1,FIND("-",A1)-2,5),""))),IFERROR(MID(A1,FIND("-",A1)-2,5),""),"")
This should exclude the ones with spaces
, then you can just copy paste as values, and Remove Duplicates
Upvotes: 0
Reputation: 3991
You could write a macro using Range.RemoveDuplicates
to de-duplicate your list and then Like
or regular expressions (see the code I posted in Parsing String Mixed with HTML, Words, Numbers, and Dates) in a loop over the remaining cells to check if cells meet your criteria and delete those that do not match.
De-duplicating then matching would probably be a bit more efficient, but you could do it either way around.
If you're new to VBA and want some code, add a comment and I'll post some.
EDIT:
You'll need to go to the Visual Basic editor (Alt-F11), select menu item Tools/References..., find and check "Microsoft VBScript Regular Expressions 5.5", then click OK. Then in the project explorer (Ctrl- R), right-click on the VBA Project
for your workbook and Insert
> Module
.
Add the following code:
Public Function RegEx(strInput As String, strRegEx As String, Optional bIgnoreCase As Boolean = True, Optional bMultiLine As Boolean = False) As Boolean
Dim RegExp As VBScript_RegExp_55.RegExp
Set RegExp = New VBScript_RegExp_55.RegExp
With RegExp
.MultiLine = bMultiLine
.IgnoreCase = bIgnoreCase
.Pattern = strRegEx
End With
RegEx = RegExp.test(strInput)
Set RegExp = Nothing
End Function
(You can add the other Regular Expression code from Parsing String Mixed with HTML, Words, Numbers, and Dates if you think you might use it later)
Add the following code (Assuming that the data that you want to delete is in Column A):
Public Sub DedupeAndFilter()
Dim RCtr As Long
ActiveSheet.Range("A:A").RemoveDuplicates Columns:=1, Header:=xlNo
For RCtr = ActiveSheet.UsedRange.Rows.Count To 1 Step -1
If ActiveSheet.Range("A:A").Rows(RCtr).Text = "" Then
ActiveSheet.Range("A:A").Rows(RCtr).Delete xlShiftUp
ElseIf Not RegEx(ActiveSheet.Range("A1").Rows(RCtr), "[A-Z]{2}-\d\d", True) Then
ActiveSheet.Range("A:A").Rows(RCtr).Delete xlShiftUp
End If
Next
End Sub
then, with the cursor in the DedupeAndFilter
code block, press F5
or click the green Run ">" triangle. The code will then remove duplicates, blank cells and non-conforming cells from Column A.
If you want to change the column affected, change "A:A" in ActiveSheet.Range("A:A")
to any other column reference, or substitute Activesheet.Selection
and select the column you want.
Upvotes: 2