chris penguin
chris penguin

Reputation: 1

extract string from file at specific line in python

I'm trying to extract unit information from a text file. This function always returns 'm' regardless of the real unit in the file. What am I doing wrong?

def get_seba_unit(file):
    with open(file) as f:
        unit = ''
        lines = f.readlines()
        if lines[10].find('m'):
            unit = 'm'
        elif lines[10].find('cm'):
            unit = 'cm'
        elif lines[10].find('°C'):
            unit = '°C'
        print('found Unit: ' + unit + ' for sensor: ' + file)
        return(unit)

Upvotes: 0

Views: 306

Answers (2)

BPL
BPL

Reputation: 9863

If what you're looking for is a way to extract out units from your data, i'd use some simple regex like the below one:

import io
import re
from collections import defaultdict

data = io.StringIO("""

1cm

2m

3°C

1cm 10cm

2m 20m

3°C           30°C

""")


def get_seba_unit(file):
    floating_point_regex = "([-+]?\d*\.\d+|\d+)"
    content = file.read()
    res = defaultdict(set)

    for suffix in ['cm', 'm', '°C']:
        p = re.compile(floating_point_regex + suffix)
        matches = p.findall(content)
        for m in matches:
            res[suffix].add(m)

    return dict(res)

print(get_seba_unit(data))

And you'd get an output like this one:

{'cm': {'1', '10'}, '°C': {'3', '30'}, 'm': {'2', '20'}}

Of course, the above code is just assuming your units will be floating point units but the main idea would be attacking this problem using regular expressions.

Upvotes: 0

Arkady
Arkady

Reputation: 15079

This does not do what you think it does:

if lines[10].find('m'):

find returns the index of the thing you are looking for, or -1 if it's not found. So unless m is the first character on the line (index 0), your condition will always be True (In Python a non-zero number is truthy)

You might want to try if 'm' in line[10] instead

Also, check for cm before m, otherwise you'll never find cm

Upvotes: 1

Related Questions