Alberto Brown
Alberto Brown

Reputation: 355

find substring in a string, but look from the right to the left excel

I had originally asked the question below and found a good formula excel formula find part number in file path text string

So after using the =MID(A1,SEARCH("????-??????-??",A1),14) formula to find the part number names things seemed fine until I ran into data that looks like the sample set below.

c:\users\stuff\folder_name\folder_name_9876-vv-123-Ag\1234-000001-01_ baskets_1.pdf
c:\users\stuff\folder_name_9876-vv-123-Ag\random file_1234-000001-02_ baskets_2.pdf
c:\users\stuff\folder_name_XXXX-XXXXXX-XX\1234-000001-03_ baskets_3.pdf
c:\users\stuff\folder_name\1234-000030-01_ tree_30.pdf
c:\users\stuff\folder_name\random text_1234-000030-02_ tree_30.pdf
c:\users\stuff\folder_name\more random stuff_1234-000030-02_ tree_30.pdf

I am looking thru a few hundred thousand file names and I know I am missing some important files due to the false positive triggers. Like 9876-vv-123-Ag it technically fits the ????-??????-?? format but is not what I want to extract from the string. Now to the question, is there a way to apply =MID(A1,SEARCH("????-??????-??",A1),14) that can look from the right side of the string and work its way left to ensure that it grabs the last sub-string in the full string that meets the ????-??????-?? format?

or is there a way to set the wildcard search to numeric only instead of anything in that ????-??????-?? format?

Upvotes: 0

Views: 4247

Answers (2)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60324

If there are no other ambiguities, you can search just the file name portion of the file path, and return the matching string:

=MID(TRIM(RIGHT(SUBSTITUTE(A1,"\",REPT(" ",99)),99)),SEARCH("????-??????-??",TRIM(RIGHT(SUBSTITUTE(A1,"\",REPT(" ",99)),99))),14)

enter image description here

If there are other ambiguities, you will need to be more specific

This portion of the formula: TRIM(RIGHT(SUBSTITUTE(A1,"\",REPT(" ",99)),99)) returns the last substring where the string is split on the "\" character. If it might be the case that the file name might have more than 99 characters, then increase the 99 appropriately, or use something like LEN(A1) instead.

If you want to use a VBA routine, the following routine processed 500,000 rows in less than 4 seconds. It does depend on the part number consisting of only digits, but the change would be trivial if other patterns need to be included.

Option Explicit
Sub partNum()
Dim RE As Object
Dim R As Range, WS As Worksheet
Dim vSrc As Variant, vRes As Variant
Dim I As Long

Set WS = Worksheets("sheet1")
With WS
    vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
    Set R = .Range(.Cells(1, 2), .Cells(UBound(vSrc, 1), 2))
End With

ReDim vRes(1 To UBound(vSrc, 1), 1 To 1)

Set RE = CreateObject("vbscript.regexp")
With RE
    .Pattern = "\d{4}-\d{6}-\d{2}"

    For I = 1 To UBound(vSrc)
        If .test(vSrc(I, 1)) = True Then vRes(I, 1) = .Execute(vSrc(I, 1))(0)
    Next I
End With

R.EntireColumn.Clear
R = vRes

End Sub

VBA Function (stand-alone)

Option Explicit
Function getPartNum(S As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
    .Pattern = "\d{4}-\d{6}-\d{2}(?!.*\\)"
    If .test(S) = True Then getPartNum = .Execute(S)(0)
End With
End Function

Upvotes: 1

VBasic2008
VBasic2008

Reputation: 54838

Obvious Solutions

Excel Substitution Version (6s for 50000 rows)

Works only for 2 possible occurrences of the string choosing the last.

=IF(ISERROR(SEARCH("????-??????-??";A2));"";IF(ISERROR(SEARCH("????-??????-??";A2;SEARCH("????-??????-??";A2)+1));MID($A2;SEARCH("????-??????-??";A2);LEN("????-??????-??"));MID($A2;SEARCH("????-??????-??";A2;SEARCH("????-??????-??";A2)+1);LEN("????-??????-??"))))

A Sub Version (96s for 50000 rows)

Sub SearchQ2()

  Const cStrSource As String = "A2"
  Const cStrTarget As String = "B2"
  Const cStrSearch As String = "????-??????-??"

  Dim vntRange As Variant
  Dim lngSearch As Long
  Dim intStart As Integer
  Dim lng1 As Long
  Dim strTemp As String

  vntRange = Range(cStrSource).Resize(Cells(Rows.Count, Range("A1").Column) _
      .End(xlUp).Row - Range(cStrSource).Row + 1)

  For lng1 = 1 To UBound(vntRange)
    intStart = 1
    Do
      On Error Resume Next
      lngSearch = WorksheetFunction.Search(cStrSearch, _
          vntRange(lng1, 1), intStart)
      If Err Then
        Exit Do
       Else
        strTemp = Mid(vntRange(lng1, 1), lngSearch, Len(cStrSearch))
        intStart = lngSearch + Len(cStrSearch) + 1
      End If
    Loop
    vntRange(lng1, 1) = strTemp
  Next

  Range(cStrTarget).Resize(Cells(Rows.Count, Range("A1").Column) _
      .End(xlUp).Row - Range(cStrSource).Row + 1) = vntRange

End Sub

The Worst Function Ever

It works fine for a hundred, maybe a few hundred cells. This might be an inspiration for someone to write an efficient function.

It loops through the string to find the last occurrence of the SearchString.

Function SearchQ(SearchString As String, Cell As Range) As String
  Application.Volatile
  Dim lngSearch As Long
  Dim intStart As Integer
  intStart = 1
  Do
    On Error Resume Next
    lngSearch = WorksheetFunction.Search(SearchString, _
        Cell.Cells(1, 1).Text, intStart)
    If Err Then
      Exit Do
     Else
      SearchQ = Mid(Cell.Cells(1, 1).Text, lngSearch, Len(SearchString))
      intStart = lngSearch + Len(SearchString) + 1
    End If
  Loop
End Function

Upvotes: 0

Related Questions