Codeman
Codeman

Reputation: 507

Counting the number of elements in a molecular compound with Python (recursion if possible)?

So I'm trying to code something to tell me the number of elements in any given compound. I'm not even sure where to start: I tried coding something but then realized that it only worked for simple compounds (or did not work at all). Here's an example of what I want:

>>> function_input : 'NaMg3Al6(BO3)3Si6O18(OH)4', 'O'

>>> function_return : 31

I've come so far in my mess of a code (IT DOESN'T WORK, it just illustrates my rough thought process):

def get_pos_from_count(string: str, letter: str):
    count = string.count(letter)
    lens = [-1]
    for i in range(count):
        lens += [string[lens[i] + 1:].index(letter) + lens[i] + 1]
    return lens[1:]



def find_number(string, letter):
    if string.count(letter) == 0: return 0
    numbers = '1234567890'
    try:
        mul1 = int(string[0])
    except ValueError:
        mul1 = 0
    mul2 = []
    sub_ = 1
    list_of_positions = get_pos_from_count(string, letter)
    for i in list_of_positions:
        try:
            sub_ += int(string[i + 1]) if string[i + 1] in numbers else 0
        except IndexError: pass
        if string[i + 1:].count(')') > string[i + 1].count('('):
            try:
                mul2 += int(string[string[i + 1:].count(')') + 1])
            except (IndexError, ValueError): pass
    return mul1 * sub_ * mul2

The approach I was trying to implement was:

  1. Find the number of occurrences of said element in compound.
  2. Find each subscript, multiplying by subscript outside bracket if said element is in bracket.
  3. Sum up all subscripts, multiply by number of compounds (first character in string)
  4. Return said number to user

But then I realized my code would either be extremely long or require recursion, which I don't know how to apply here.

If possible, I'd like a semi-working function, but a quick tip on how to approach this is also helpful!

And I don't want to use external libraries if possible.

tl;dr: This question for atomicity of elements, without external libraries (if possible).

EDIT: Yes, the question I linked does have hints on how to do this, but when I tried to make any code work for only one element, and set it's weight to 1, I ran into a host of issues I don't know to solve.

Upvotes: 3

Views: 1111

Answers (2)

Stef
Stef

Reputation: 15515

Let us divide the task into three parts:

  • Tokenize the string into a list of elements, numbers, and brackets;
  • Parse the bracket to have a nested list with sublists;
  • Count the elements in the nested list.

Introducing my tools:

from more_itertools import split_when, pairwise
from itertools import chain
from collections import Counter

def nest_brackets(tokens, i = 0):
    l = []
    while i < len(tokens):
        if tokens[i] == ')':
            return i,l
        elif tokens[i] == '(':
            i,subl = nest_brackets(tokens, i+1)
            l.append(subl)
        else:
            l.append(tokens[i])
        i += 1
    return i,l

def parse_compound(s):
    tokens = [''.join(t) for t in split_when(s, lambda a,b: b.isupper() or b in '()' or (b.isdigit() and not a.isdigit()))]
    tokens = [(int(t) if t.isdigit() else t) for t in tokens]
    i, l = nest_brackets(tokens)
    assert(i == len(tokens)) # crash if unmatched ')'
    return l

def count_elems(parsed_compound):
    c = Counter()
    for a,b in pairwise(chain(parsed_compound, (1,))):
        if not isinstance(a, int):
            subcounter = count_elems(a) if isinstance(a, list) else {a: 1}
            n = b if isinstance(b, int) else 1
            for elem,k in subcounter.items():
                c[elem] += k * n
    return c

s = 'NaMg3Al6(B(CO2)3)3Si6O18(OH)4'

l = parse_compound(s)
print(l)
# ['Na', 'Mg', 3, 'Al', 6, ['B', ['C', 'O', 2], 3], 3, 'Si', 6, 'O', 18, ['O', 'H'], 4]


c = count_elems(l)
print(c)
# Counter({'O': 40, 'C': 9, 'Al': 6, 'Si': 6, 'Mg': 3, 'B': 3, 'Na': 1})

print(c['O'])
# 40

Upvotes: 2

Riccardo Bucco
Riccardo Bucco

Reputation: 15364

Try with this recursive function:

import re

def find_number(s, e):
    if s == '':
        return 0
    x = re.search('\(([A-Za-z0-9]*)\)([0-9]*)', s)
    if x is not None:
        return (find_number(s[:x.start()] + s[x.end():], e) +
                find_number(x.groups()[0], e) * int(x.groups()[1] or '1'))
    return sum(int(x.groups()[0] or '1')
               for x in re.finditer(f'{e}([0-9]*)', s))

Example:

>>> find_number('NaMg3Al6(BO3)3Si6O18(OH)4', 'O')
31

Upvotes: 0

Related Questions