Reputation: 47
I scraped text from old files and need to get numerical data placed within strings.
Strings look like:
"season: 1983 colony: 23 colony weight: 4 kg yeild: 12 kg
"season: 1983 colony:- colony weight: 5 kg yeild: 14 kg"
I made a function that takes a raw data string and returns an array of integers.
Function getClearBeeData(rawData As Variant) As Integer()
Dim retValue(4) As Integer 'array where each found number stored
Dim strTempString As String 'temporary string to hold current number
Dim i, k As Integer 'i counter for original string, k counter for array position
Dim token As Boolean 'token shows whether previous chars were number
token = False
For i = 1 To Len(rawData) 'go through original string
If IsNumeric(Mid(rawData, i, 1)) Then 'if current char is numeric
strTempString = strTempString & Mid(rawData, i, 1) 'add current char to remporary string
token = True 'show that current position is within numbers
ElseIf Mid(rawData, i, 1) = Chr(45) Then 'in string a symbol "-" can appear
strTempString = "0"
token = True
ElseIf Not IsNumeric(Mid(rawData, i, 1)) And token = True Then 'if current char is not numeric and token shows that previous char was number
retValue(k) = CInt(strTempString) 'convert temporary string to int and write in to the array
k = k + 1 'go to next array position
token = False 'switch token to show that current position is not within numbers
strTempString = "" 'delete stored data from temporary string
End If
Next
If Len(strTempString) > 0 Then
retValue(k) = CInt(strTempString) 'if original string ended with numbers, write that numbers to array
End If
getClearBeeData = retValue
End Function
Test sub to print data.
Sub printClearBeeData()
Dim rawData As String
Dim clearDataArr() As Integer
Dim i As Integer
rawData = "season: 1983 colony: 12 colony weight: - kg yeild: 16 kg"
clearDataArr = getClearBeeData(rawData)
For i = LBound(clearDataArr) To UBound(clearDataArr) - 1
Debug.Print clearDataArr(i)
Next
End Sub
Everything works. Could I do it better? (As I work alone nobody can code review.)
I didn't use regular expressions because I don't know them.
Upvotes: 1
Views: 209
Reputation: 9948
Split
combined with Filter
allows one loop only
Just for fun and in addition to @FaneDuru 's valid answer, I demonstrate how to use the Split()
function combined with Filter()
thus avoiding an additional transfer of array items in a help function by a second loop.
Sub TrickySplit()
Dim rawData As String
rawData = "season: 1983 colony: 12 colony weight: - kg yeild: 16 kg"
'[1] split to tokens omitting the 1st element "seasons"
' ~> 1983 colony, 12 colony weight, - kg yeild, 16 kg
Dim words: words = Split(rawData, ": "): words = Filter(words, words(0), False)
'[2] convert number strings or "-" to values
Dim i: For i = 0 To UBound(words): words(i) = Val(words(i)): Next
'[3] optional (display results in VB Editors Immediate Window
' ~> 1983,12,0,16
Debug.Print Join(words, ",")
End Sub
Further hints
ad) [1]
: the classical Split()
via delimiter : "
results in a 0-based array:
season|1983 colony|12 colony weight|- kg yeild|16 kg
The first item words(0)
~> season
isn't needed and can be removed immediately by the Filter()
function resulting in:
1983 colony, 12 colony weight, - kg yeild, 16 kg
ad) [2]
: the Val()
function words(i) = Val(words(i))
uses the fact that it ignores following characters even interpreting "-"
as 0
(So it's not necessary to execute a theoretical second split via words(i) =Val(Split(words(i), " ")(0))
to remove the appendix-string).
Upvotes: 3
Reputation: 42256
Test also this (standard VBA) piece of code, please. It is a little shorter and simpler then yours:
Private Function extractIntegers(rowData As String, strDelim As String) As Variant
Dim arr As Variant, arrInt(3) As Variant, i As Long
arr = Split(rowData, strDelim)
For i = 1 To UBound(arr)
arrInt(i - 1) = val(arr(i))
Next i
extractIntegers = arrInt
End Function
A procedure to test it, would be:
Sub testexractIntegers()
Dim rowData As String, El As Variant, arrInt As Variant
rowData = "season: 1983 colony: 23 colony weight: 4 kg yeild: 12 kg"
'rowData = "season: 1983 colony: - colony weight: 4 kg yeild: 12 kg"
arrInt = extractIntegers(rowData, ": ")
For Each El In arrInt
Debug.Print Int(El)
Next
End Sub
If you uncomment the second rowData
definition (containing "-"), the array will return 0 for that specific array element
Upvotes: 3
Reputation: 75990
Since your string pattern would always be the same (where one or multiple substrings could be represented by "-"), RegEx
could become a simple, easy to implement tool to retrieve these substrings. A very basic example to get you going:
Sub Test()
Dim rawData As String, ClearBeeData as Object
rawData = "season: 1983 colony: 12 colony weight: - kg yeild: 16 kg"
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "\d+|-"
Set ClearBeeData = .Execute(rawData)
End With
For Each el In ClearBeeData
Debug.Print Val(el)
Next
End Sub
The link provided by @Peh will give you tons of information, but some small remarks to make here:
.Global = True
- If this was set to False
we can also retrieve the first match. When set to True
we can retrieve all matches..Pattern = "\d+|-"
- A very basic pattern to search within the full string where \d
is simply short for character class [0-9]
and +
searches for a substring of at least one character (or longer). The pipe-symbol represents OR
so if the position doesn't hold any numbers, we can search for the -
alternatively..Execute(...)
- Will return a Matchcollection type object of matches that are found. Note that it would yield an error when no matches can be found. No big deal here when strings have the same pattern, but for future reference you might want to include a .Test
before we try to .Execute
.Val(el)
- Since all elements in the returned collection are text, we can use Val
to return them as numbers if you so desire. Cool feature is that Val("-")
will return 0. So for above example your results will be:
1983
12
0
16
Upvotes: 5
Reputation: 4355
A solution which does not rely on regexp but which does require that numbers are always surrounded by spaces.
The function returns a scripting dictionary of Index vs Number where Index is the first character of the number in the raw data string and Number is the numeric string converted to Type Double.
Option Explicit
Sub TestGetClearBeeData()
Dim mySD As Scripting.Dictionary
Set mySD = getClearBeeData("season: 1983 colony: - colony weight: 5 kg yeild: 14 kg")
Dim myItem As Variant
For Each myItem In mySD
Debug.Print myItem, mySD.Item(myItem)
Next
End Sub
Function getClearBeeData(ByVal ipRawData As String) As Scripting.Dictionary
Dim myItems As Variant
myItems = Split(Replace(ipRawData, "-", "0"))
Dim myItem As Variant
Dim myNumbers As Scripting.Dictionary: Set myNumbers = New Scripting.Dictionary
Dim myLen As Long: myLen = 1
For Each myItem In myItems
If IsNumeric(myItem) Then
myNumbers.Add myLen, CDbl(myItem)
End If
myLen = myLen + Len(myItem) + 1
Next
Set getClearBeeData = myNumbers
End Function
Upvotes: 1