Rob Kwasowski
Rob Kwasowski

Reputation: 2780

Python: Regex with raw f-string only works 50% of the time

I have a dictionary of currency codes each with a set of country codes. I want to search for the currency of a particular country with Regex so I started writing a pattern, but the one I've got at the moment only works about 50% of the time. Is this a bug in Python?

This is the code. Make sure to run it at least five to ten times to see that it only works some of the time.

local_currencies = str({
    'GBP': {'UK'},
    'USD': {'US'},
    'EUR': {'FR', 'DE', 'IT'},
})

country = 'FR'
pattern = fr"'.{{3}}': ?\{{'{country}'"

print(re.search(pattern, local_currencies))

Upvotes: 1

Views: 124

Answers (3)

dawg
dawg

Reputation: 104032

An alternate approach is to invert the dict of sets to be a dict of countries and their currencies:

local_currencies = {
    'GBP': {'UK'},
    'USD': {'US'},
    'EUR': {'FR', 'DE', 'IT'},
}

invert={}
for k, se in local_currencies.items():
    for e in se:
        invert[e]=k

>>> invert
{'UK': 'GBP', 'US': 'USD', 'IT': 'EUR', 'DE': 'EUR', 'FR': 'EUR'}
>>> invert['FR']
'EUR'

Upvotes: 0

Tomalak
Tomalak

Reputation: 338326

In general it's not a good idea to convert complex data structures to string and then use string operations on the result to make statements about the contained data. String operations such as regex are (literally) dumb.

Keep the data structure and access it directly. Given this dict with nested sets:

local_currencies = {
    'GBP': {'UK'},
    'USD': {'US'},
    'EUR': {'FR', 'DE', 'IT'},
}

it's easy to answer a question like "Which currencies are being used in country X?" with a list comprehension:

country = 'FR'
currencies = [curr for curr in local_currencies if country in local_currencies[curr]]

Result:

['EUR']

For countries with multiple currencies, the list would be longer.

Upvotes: 1

Aaron Bentley
Aaron Bentley

Reputation: 1380

The reason this fails is because {'FR', 'DE', 'IT'} is a set, and sets have no defined order. When this succeeds, it's because 'FR' appeared first in the string representation. When it fails, it's because 'DE' or 'IT' came first. This is not a bug in Python. You cannot expect consistent ordering from an unordered collection.

I strongly recommend you use a different approach. It is very bad form to depend on the string representation of python objects. Instead, you could create a reverse mapping, e.g. country_to_currency = {'FR': 'EUR', 'DE': 'EUR', 'US': 'USD'}. You can then simply do country_to_currency['FR'].

Upvotes: 5

Related Questions