Reputation: 373
i have:
list1 = ['A1', 'A2.1', 'A2.2','A2.3','A2.4','B1.1','B1.2','B1.3','B1.4','B1.5','B1.6','B1.7','B1.8a','B1.8b','B2.1','B2.2','B2.3','B2.4','B2.5','B2.6','B2.7','B2.8','B2.9','B2.10','B2.11','B2.12','B2.13','B2.14','B2.15','B2.16','B2.17','B2.18','B2.19','B2.20a','B2.20b','B2.20c']
(this is a part of the entire list) 2. and this string,
string1 = "A1Contributo pubblico1559.020• 559.020,00A2.2Cofinanziamentoprivato in denaro122.500• 22.500,00A2.4Entrate generate dalprogetto00• 0,00B2.20aLocali: locazioni eutenze00• 0,00B2.20bImmobili:ammortamenti00• 0,00B2.20cImmobili:manutenzioneordinaria00• 0,00B2.21Attrezzature: noleggi eleasing00• 0,00B2.22Attrezzature:manutenzioni ordinarie00• 0,00B2.23Attrezzature:ammortamenti00• 0,00B2.1Docenza (dipendenti ecollaboratori)00• 0,00B2.14Viaggi di studio deipartecipanti00• 0,00B2.18Materiali diconsumo/materialididattici00• 0,00E1.1UCS ora formazione5.94085• 504.900,00E1.2UCS allievo120403,5• 48.420,00E1.3Costi acofinanziamentoprivato150150• 22.500,00E1.4UCS ora/utente(individuale)15038• 5.700,00"
I want verify if the elements in list1 are contained in string1 and their position inside the string.
My final goal is extract from string, for each code, the relative amount, for example for code "A1" -> 559.020,00, for code "A2.2" -> 22.500,00 and so on.
At first I tried with a simple:
for code in list1:
stringPosition = re.search(code, string1)
but with this solution i have a problem with the codes like B2.2, B2.20 because i find the same position.
So i tried to understand how to search the exact code inside the string. I saw this posts:
How do I check for an EXACT word in a string in python
Match exact phrase within a string in Python
Regex find whole substring between parenthesis containing exact substring
How to search for a word (exact match) within a string?
and i tried to apply the suggested solutions (e.g. using r'\w' + (code) + r'\w') but without success.
my first attempt:
for code in list1:
stringPosition = re.search(code, string1)
if stringPosition != None:
print(code, stringPosition)
list2.append(stringPosition)
Thank you in advance for any suggestion
Upvotes: 1
Views: 166
Reputation: 1598
You can do it with the right regex:
import re
MAGIC_REGEX = "([A-Z]\d+(?:.\d+[a-z]?)?)[^\•]+\• ([\d\.,]+)"
matches = re.findall(MAGIC_REGEX, string1)
print(matches)
filtered = list(filter(lambda x: x[0] in list1, matches))
print(filtered)
number_filtered = list(map(lambda x: (x[0], float(x[1].replace(".", "").replace(",", "."))), filtered))
print(number_filtered)
Since I was not sure of what exactly you wanted I clearly separated three steps:
Right after matching the regex, you get kind of what you asked for
matches = [('A1', '559.020,00'), ('A2.2', '22.500,00'), ('A2.4', '0,00'), ('B2.20a', '0,00'), ('B2.20b', '0,00'), ('B2.20c', '0,00'), ('B2.21', '0,00'), ('B2.22', '0,00'), ('B2.23', '0,00'), ('B2.1', '0,00'), ('B2.14', '0,00'), ('B2.18', '0,00'), ('E1.1', '504.900,00'), ('E1.2', '48.420,00'), ('E1.3', '22.500,00'), ('E1.4', '5.700,00')]
Filtered is when you get only the codes you have in your list (notice the difference is B2.21-23 and all E codes, since they are not in your list)
filtered = [('A1', '559.020,00'), ('A2.2', '22.500,00'), ('A2.4', '0,00'), ('B2.20a', '0,00'), ('B2.20b', '0,00'), ('B2.20c', '0,00'), ('B2.1', '0,00'), ('B2.14', '0,00'), ('B2.18', '0,00')]
Converted values to float, you need to remove that first point and change the comma into a point
number_filtered = [('A1', 559020.0), ('A2.2', 22500.0), ('A2.4', 0.0), ('B2.20a', 0.0), ('B2.20b', 0.0), ('B2.20c', 0.0), ('B2.1', 0.0), ('B2.14', 0.0), ('B2.18', 0.0)]
Upvotes: 1