Claus
Claus

Reputation: 119

Python Regular Expressions Return Floating Points

For a project I have to extract the RGB values from a file which are defined as following:

#71=IFCCOLOURRGB($,0.75,0.73,0.6800000000000001);
#98=IFCCOLOURRGB($,0.26,0.22,0.18);

I want to retun the RGB data and write it to a new file like this:

0.75 0.73 0.68

0.26 0.22 0.18

So far I've created this for loop:

import re 

IfcFile = open('IfcOpenHouse.ifc', 'r')

IfcColourRGB = re.compile('ifccolourrgb', re.IGNORECASE)


for rad_rgb_data in IfcFile:
    if re.search(IfcColourRGB, rad_rgb_data):
        print(IfcColourRGB.sub('', rad_rgb_data))

This returns:

#71=($,0.75,0.73,0.6800000000000001);

#98=($,0.26,0.22,0.18);

Now I am quite new to programming and I want to know if I've chosen the right approach for my task, I've been reading about regular expressions but I don't fully understand how to get rid of all the #=(,: characters and how to exactly specify which numbers you want returned and which not. Is it possible to define all regular expressions explicitly/individually and combining them in one for loop so I have an easier time understanding them?

Upvotes: 0

Views: 134

Answers (3)

beroe
beroe

Reputation: 12316

I think you are overthinking this :^) You can loop through the lines and perform this search on each.

import re
Searcher = re.compile("IFCCOLOURRGB\(\$,([\d\.]+),([\d\.]+),([\d\.]+)")

for Line in IfcFile:
    Result = Searcher.search(Line)
    if Result:
        print Result.groups()

If you are just writing the values back out to a file, you don't need to convert to float after,except to truncate the 00000001 and print to 2 significant figures.

Upvotes: 0

Andie2302
Andie2302

Reputation: 4887

To extract the colors use:

IFCCOLOURRGB\((?P<Red>\.[0-9]{1,16}|[0-9]+(?:\.[0-9]{1,16})?),(?P<Green>\.[0-9]{1,16}|[0-9]+(?:\.[0-9]{1,16})?),(?P<Blue>\.[0-9]{1,16}|[0-9]+(?:\.[0-9]{1,16})?)\)

Capturing groups:

Red: value of red Green: value of green, Blue: value of blue


match = re.search(r"IFCCOLOURRGB\((?P<Red>\.[0-9]{1,16}|[0-9]+(?:\.[0-9]{1,16})?),(?P<Green>\.[0-9]{1,16}|[0-9]+(?:\.[0-9]{1,16})?),(?P<Blue>\.[0-9]{1,16}|[0-9]+(?:\.[0-9]{1,16})?)\)", subject)
if match:
    result1 = match.group("Red")
    result2 = match.group("Green")
    result3 = match.group("Blue")       
else:
    result = ""

Upvotes: 0

Kasravnd
Kasravnd

Reputation: 107287

You can use re.findall() with a positive look-behind pattern , then split with , and convert to float :

>>> s="""#71=IFCCOLOURRGB($,0.75,0.73,0.6800000000000001);
... #98=IFCCOLOURRGB($,0.26,0.22,0.18);"""
>>> import re
>>> l=re.findall(r'(?<=\$,)[\d\.,]+',s)
>>> [map(float,i.split(',')) for i in l]
[[0.75, 0.73, 0.68], [0.26, 0.22, 0.18]]

Upvotes: 2

Related Questions