LJG
LJG

Reputation: 767

Extract the substrings under certain conditions

I have the following string in Excel: C37S25FF0000TD. In 7 different cells, I need to extract 7 substring according to these directions:

Can someone help me? Since substrings can have variable length I don't know how to use the function MID() correctly. Thanks in advance.

Upvotes: 0

Views: 441

Answers (3)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60344

Edited to allow for leading zero's in second substring

Here is a set of formulas that take into account the fact that some of the substrings may be of varying length.

Some of the formulas use functions only found in Office 365. However, substitutes for earlier versions are also available:

Letters up to first number: = LEFT(A1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))-1)

Next first full number (variable number of digits: 
    =LET(s,MID(A1,LEN(B1)+1,LEN(A1)),
       arr, MID(s,SEQUENCE(LEN(s)),1),
       len, MATCH(TRUE,ISERR(-arr),0),
       LEFT(s,len-1))

Subsequent letter(s): 
        =LET(s,MID(A1,SUM(LEN(B1:B2))+1,LEN(A1)),
           pos,MIN(FIND({0,1,2,3,4,5,6,7,8,9},s&"0123456789")),
           LEFT(s,pos-1))

Next two characters: =MID(A1,SUM(LEN(B1:B3))+1,2)

For the subsequent substrings, we can count from the right:
   First 6 of the last 8 characters: =MID(A1,LEN(A1)-7,6)

    Penultimate character: =MID(A1,LEN(A1)-1,1)

    Last character:  =RIGHT(A1)

enter image description here

Upvotes: 2

JvdV
JvdV

Reputation: 75920

Reminded me of this UDF that can come in handy if you'd change the pattern:

Function GetData(inp As String, grp As Long) As String

With CreateObject("VBScript.RegExp")
    .Pattern = "^([A-Z]+)(\d+)([A-Z]+)(\d\d)(.{6})(.)(.)$"
    If .Test(inp) Then
        GetData = .Execute(inp)(0).Submatches(grp - 1)
    Else
        GetData = "No Data Found"
    End If
End With

End Function

Here is an online demo. It would match:

  • ^ - Start-line anchor;
  • ([A-Z]+) - A 1st capture group to match 1+ uppercase alpha-chars;
  • (\d+) - A 2nd capture group to match 1+ digits;
  • ([A-Z]+) - A 3rd group identical to 1st;
  • (\d\d) - A 4th group matching exactly two digits;
  • (.{6}) - A 5th group matching exactly 6 characters other than newline;
  • (.) - A 6th group capturing the 2nd to last character;
  • (.) - A final 7th group matching the last character;
  • $ - End-line anchor.

Call through =GetData(A1,1) where the 2nd parameter can be 1-7.


Also thought it's fun to see how to mimic this with a single formula:

enter image description here

Formula in B1:

=LET(A,A1,B,LEFT(A,LEN(A)-8),C,TEXTSPLIT(B,SEQUENCE(10,,0),,1),D,TEXTSPLIT(B,C,,1),E,MID(RIGHT(A,8),{1,7,8},{6,1,1}),HSTACK(TOROW(VSTACK(C,D),,1),E))

Upvotes: 3

Solar Mike
Solar Mike

Reputation: 8385

So, based on the single example you gave and making no assumptions about changing positions:

enter image description here

You can edit to deal with any of the numbers that may contain extra digits etc by expanding on the technique using len() and find() as necessary.

Upvotes: 0

Related Questions