yar
yar

Reputation: 47

How to get integers from string?

I scraped text from old files and need to get numerical data placed within strings.

Strings look like:

"season: 1983 colony: 23 colony weight: 4 kg yeild: 12 kg
"season: 1983 colony:- colony weight: 5 kg yeild: 14 kg"

I made a function that takes a raw data string and returns an array of integers.

Function getClearBeeData(rawData As Variant) As Integer()
  Dim retValue(4) As Integer 'array where each found number stored
  Dim strTempString As String 'temporary string to hold current number
  Dim i, k As Integer 'i counter for original string, k counter for array position
  Dim token As Boolean 'token shows whether previous chars were number
  token = False
  
  For i = 1 To Len(rawData)   'go through original string
  
    If IsNumeric(Mid(rawData, i, 1)) Then   'if current char is numeric
      strTempString = strTempString & Mid(rawData, i, 1)  'add current char to remporary string
      token = True  'show that current position is within numbers
    ElseIf Mid(rawData, i, 1) = Chr(45) Then  'in string a symbol "-" can appear
      strTempString = "0"
      token = True
    ElseIf Not IsNumeric(Mid(rawData, i, 1)) And token = True Then  'if current char is not numeric and token shows that previous char was number
      retValue(k) = CInt(strTempString)   'convert temporary string to int and write in to the array
      k = k + 1   'go to next array position
      token = False   'switch token to show that current position is not within numbers
      strTempString = ""  'delete stored data from temporary string
    End If
    
  Next
  
  If Len(strTempString) > 0 Then
    retValue(k) = CInt(strTempString) 'if original string ended with numbers, write that numbers to array
  End If
  getClearBeeData = retValue
End Function

Test sub to print data.

Sub printClearBeeData()
  Dim rawData As String
  Dim clearDataArr() As Integer
  Dim i As Integer
  rawData = "season: 1983 colony: 12 colony weight: - kg yeild: 16 kg"
  clearDataArr = getClearBeeData(rawData)
  For i = LBound(clearDataArr) To UBound(clearDataArr) - 1
    Debug.Print clearDataArr(i)
  Next
End Sub

Everything works. Could I do it better? (As I work alone nobody can code review.)
I didn't use regular expressions because I don't know them.

Upvotes: 1

Views: 209

Answers (4)

T.M.
T.M.

Reputation: 9948

Split combined with Filter allows one loop only

Just for fun and in addition to @FaneDuru 's valid answer, I demonstrate how to use the Split() function combined with Filter() thus avoiding an additional transfer of array items in a help function by a second loop.

Sub TrickySplit()
    Dim rawData As String
    rawData = "season: 1983 colony: 12 colony weight: - kg yeild: 16 kg"

    '[1] split to tokens omitting the 1st element "seasons"
    '    ~> 1983 colony, 12 colony weight, - kg yeild, 16 kg
    Dim words: words = Split(rawData, ": "): words = Filter(words, words(0), False)

    '[2] convert number strings or "-" to values
    Dim i: For i = 0 To UBound(words): words(i) = Val(words(i)): Next

    '[3] optional (display results in VB Editors Immediate Window
    '    ~> 1983,12,0,16
    Debug.Print Join(words, ",")
End Sub

Further hints

ad) [1]: the classical Split() via delimiter : " results in a 0-based array:

   season|1983 colony|12 colony weight|- kg yeild|16 kg 

The first item words(0) ~> season isn't needed and can be removed immediately by the Filter() function resulting in:

   1983 colony, 12 colony weight, - kg yeild, 16 kg

ad) [2]: the Val() function words(i) = Val(words(i)) uses the fact that it ignores following characters even interpreting "-" as 0

(So it's not necessary to execute a theoretical second split via words(i) =Val(Split(words(i), " ")(0)) to remove the appendix-string).

Upvotes: 3

FaneDuru
FaneDuru

Reputation: 42256

Test also this (standard VBA) piece of code, please. It is a little shorter and simpler then yours:

Private Function extractIntegers(rowData As String, strDelim As String) As Variant
 Dim arr As Variant, arrInt(3) As Variant, i As Long

 arr = Split(rowData, strDelim)

 For i = 1 To UBound(arr)
    arrInt(i - 1) = val(arr(i))
 Next i
 extractIntegers = arrInt
End Function

A procedure to test it, would be:

Sub testexractIntegers()
  Dim rowData As String, El As Variant, arrInt As Variant
  rowData = "season: 1983 colony: 23 colony weight: 4 kg yeild: 12 kg"
  'rowData = "season: 1983 colony: - colony weight: 4 kg yeild: 12 kg"
  arrInt = extractIntegers(rowData, ": ")
  For Each El In arrInt
    Debug.Print Int(El)
  Next
End Sub

If you uncomment the second rowData definition (containing "-"), the array will return 0 for that specific array element

Upvotes: 3

JvdV
JvdV

Reputation: 75990

Since your string pattern would always be the same (where one or multiple substrings could be represented by "-"), RegEx could become a simple, easy to implement tool to retrieve these substrings. A very basic example to get you going:

Sub Test()

Dim rawData As String, ClearBeeData as Object

rawData = "season: 1983 colony: 12 colony weight: - kg yeild: 16 kg"
With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "\d+|-"
    Set ClearBeeData = .Execute(rawData)
End With

For Each el In ClearBeeData
    Debug.Print Val(el)
Next

End Sub

The link provided by @Peh will give you tons of information, but some small remarks to make here:

  • .Global = True - If this was set to False we can also retrieve the first match. When set to True we can retrieve all matches.
  • .Pattern = "\d+|-" - A very basic pattern to search within the full string where \d is simply short for character class [0-9] and + searches for a substring of at least one character (or longer). The pipe-symbol represents OR so if the position doesn't hold any numbers, we can search for the - alternatively.
  • .Execute(...) - Will return a Matchcollection type object of matches that are found. Note that it would yield an error when no matches can be found. No big deal here when strings have the same pattern, but for future reference you might want to include a .Test before we try to .Execute.
  • Val(el) - Since all elements in the returned collection are text, we can use Val to return them as numbers if you so desire. Cool feature is that Val("-") will return 0. So for above example your results will be:

    1983
    12
    0
    16
    

Upvotes: 5

freeflow
freeflow

Reputation: 4355

A solution which does not rely on regexp but which does require that numbers are always surrounded by spaces.

The function returns a scripting dictionary of Index vs Number where Index is the first character of the number in the raw data string and Number is the numeric string converted to Type Double.

Option Explicit

Sub TestGetClearBeeData()

    Dim mySD As Scripting.Dictionary

    Set mySD = getClearBeeData("season: 1983 colony: - colony weight: 5 kg yeild: 14 kg")

    Dim myItem As Variant
    For Each myItem In mySD

        Debug.Print myItem, mySD.Item(myItem)

    Next

End Sub


Function getClearBeeData(ByVal ipRawData As String) As Scripting.Dictionary

    Dim myItems As Variant
    myItems = Split(Replace(ipRawData, "-", "0"))

    Dim myItem As Variant
    Dim myNumbers As Scripting.Dictionary: Set myNumbers = New Scripting.Dictionary
    Dim myLen As Long: myLen = 1
    For Each myItem In myItems

        If IsNumeric(myItem) Then

            myNumbers.Add myLen, CDbl(myItem)

        End If

        myLen = myLen + Len(myItem) + 1

    Next

    Set getClearBeeData = myNumbers

End Function

Upvotes: 1

Related Questions