Daniel
Daniel

Reputation: 57

Determine total number of atoms in a chemical formula

I have a list of thousands of chemical formulas that could include symbols for any element. I would like to determine the total number of atoms of any element in each formula. Examples include:

I want the total number of atoms in a single formula, so for the first example (CH3NO3), the answer would be 8 (1 carbon + 3 hydrogens + 1 nitrogen + 3 oxygens).

I found code by PEH (Extract numbers from chemical formula) that uses regular expression to extract the number of instances of a specific element in a chemical formula.

Could this be adapted to give the total atoms?

Public Function ChemRegex(ChemFormula As String, Element As String) As Long
    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With
    
    'first pattern matches every element once
    regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"
    
    Dim Matches As MatchCollection
    Set Matches = regEx.Execute(ChemFormula)
    
    Dim m As Match
    For Each m In Matches
        If m.SubMatches(0) = Element Then
            ChemRegex = ChemRegex + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
        End If
    Next m
    
    'second patternd finds parenthesis and multiplies elements within
    regEx.Pattern = "(\((.+?)\)([0-9])+)+?"
    Set Matches = regEx.Execute(ChemFormula)
    For Each m In Matches
        ChemRegex = ChemRegex + ChemRegex(m.SubMatches(1), Element) * (m.SubMatches(2) - 1) '-1 because all elements were already counted once in the first pattern
    Next m
End Function

Upvotes: 2

Views: 526

Answers (2)

JvdV
JvdV

Reputation: 75930

Here's my two cent's

enter image description here

Formula in C1:

=ChemRegex(A1)

Where ChemRegex() calls:

Public Function ChemRegex(ChemFormula As String) As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "[A-Z][a-z]*(\d*)"
    If .Test(ChemFormula) Then
        Set matches = .Execute(ChemFormula)
        For Each Match In matches
            ChemRegex = ChemRegex + IIf(Match.Submatches(0) = "", 1, Match.Submatches(0))
        Next
    Else
        ChemRegex = 0
    End If
End With

End Function

Or in a (shorter) 2-step regex-solution:

Public Function ChemRegex(ChemFormula As String) As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "([A-Za-z])(?=[A-Z]|$)"
    ChemFormula = .Replace(ChemFormula, "$1-1")
    .Pattern = "\D+"
    ChemFormula = .Replace(ChemFormula, "+")
    ChemRegex = Evaluate(ChemFormula)
End With

End Function

Upvotes: 1

Pᴇʜ
Pᴇʜ

Reputation: 57743

You could do that by looping through all characters. Count all capital characters and add all numbers subtracted by 1. That is the total count of elements.

Option Explicit

Public Function ChemCountTotalElements(ByVal ChemFormula As String) As Long
    Dim RetVal As Long

    Dim c As Long
    For c = 1 To Len(ChemFormula)
        Dim Char As String
        Char = Mid$(ChemFormula, c, 1)
        
        If IsNumeric(Char) Then
            RetVal = RetVal + CLng(Char) - 1
        ElseIf Char = UCase(Char) Then
            RetVal = RetVal + 1
        End If
        
    Next c
    
    ChemCountTotalElements = RetVal
End Function

Note that this does not handle parenthesis! And it does not check if the element actually exists. So XYZ2 will be counted as 4.

Also this only can handle numbers below 10. In case you have numbers with 10 and above use the RegEx solution below (which can handle that).

enter image description here

Recognize also chemical formulas with prenthesis like Ca(OH)₂

If you need a more precise way (checking the existance of the Elements) and recognizing parenthesis you need to do it with RegEx again.

Because VBA doesn't support regular expressions out of the box we need to reference a Windows library first.

  1. Add reference to regex under Tools then References
    enter image description here

  2. and selecting Microsoft VBScript Regular Expression 5.5
    enter image description here

  3. Add this function to a module

    Public Function ChemRegexCountTotalElements(ByVal ChemFormula As String) As Long
        Dim RetVal As Long
    
        Dim regEx As New RegExp
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
        End With
    
        'first pattern matches every element once
        regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"
    
        Dim Matches As MatchCollection
        Set Matches = regEx.Execute(ChemFormula)
    
        Dim m As Match
        For Each m In Matches
            RetVal = RetVal + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
        Next m
    
        'second patternd finds parenthesis and multiplies elements within
        regEx.Pattern = "(\((.+?)\)([0-9]+)+)+?"
        Set Matches = regEx.Execute(ChemFormula)
        For Each m In Matches
            RetVal = RetVal + ChemRegexCountTotalElements(m.SubMatches(1)) * (m.SubMatches(2) - 1) '-1 because all elements were already counted once in the first pattern
        Next m
    
        ChemRegexCountTotalElements = RetVal
    End Function
    

While this code will also recognize parenthesis, note that it does not recognize nested parenthesis.

enter image description here

Upvotes: 6

Related Questions