Abdullah Ergun
Abdullah Ergun

Reputation: 73

How to get one column of a text file

I have a txt file which contains different types of readings. I would like to display minimum, maximum and average value of one of the readings.

an example output of the txt file is

19-05-2020 17:23:15|25.10c,52.00%rh
19-05-2020 17:23:15|25.10c,53.00%rh
19-05-2020 17:23:15|25.20c,54.00%rh
19-05-2020 17:23:15|25.30c,55.00%rh

I would like to display the minimum and maximum value of the celsius readings only.

I have the code down below but it reads the entire file. I want it to read-only celsius readings.

_min = None
_max = None
_sum = 0
_len = 0
with open('numaralar.txt') as f:
    for line in f:
        val = int(line.strip())
        if _min is None or val < _min:
            _min = val
        if _max is None or val > _max:
            _max = val
        _sum += val
        _len += 1

_avg = float(_sum) / _len

# Print output
print("Min: %s" % _min)  
print("Max: %s" % _max)  
print("Avg: %s" % _avg)

Upvotes: 0

Views: 373

Answers (3)

wjandrea
wjandrea

Reputation: 33107

You can solve this without regex, but it's a bit of a pain. You have to split on the pipe | and get everything after, then split that on the c and get everything before.

Borrowing from Sushanth's answer:

with open("numaralar.txt") as f:
    vals = [float(line.split('|')[1].split('c')[0]) for line in f]
# vals = [25.1, 25.1, 25.2, 25.3]

# Use in-built function to get the required values.
print("Min:", min(vals))
print("Max:", max(vals))
print("Avg:", sum(vals)/len(vals))

Although, splitting makes more sense when you're processing all the columns, e.g.:

with open("test.txt") as f:
    for line in f:
        time, data = line.strip().split('|')
        temp, humidity = data.split(',')
        temp = float(temp.rstrip('c'))
        humidity = float(humidity.rstrip('%rh'))
        print(time, temp, humidity)

Upvotes: 0

sushanth
sushanth

Reputation: 8302

This is one way of getting the values, using regex

import re

# Extract the details present inside "|(26.7)c" and convert to float.
with open("numaralar.txt") as f:
    # "re.findall" extracts all the values that match the pattern
    vals = [float(x) for x in re.findall("\|(.*)c", f.read())]
# vals = [25.1, 25.1, 25.2, 25.3]

# Use in-built function to get the required values.
print("Min:", min(vals))
print("Max:", max(vals))
print("Avg:", sum(vals)/len(vals))

Upvotes: 1

xana
xana

Reputation: 499

I would recommend you to use regular expression to extract celcius reading, something like this.

[0-9]{1,}\.[0-9]{2}c

Then remove 'c' and turn it into float. Then you can do another operations on that data.

Regular expression can be used to extract specific piece of string which has some pattern. In your example - numbers, dot, numbers, "c".

Above pattern means:

  • [0-9]{1,} - one or more numbers
  • . - then one dot (we use escape sign, because only dot in regex means "any sign")
  • [0-9]{2} - then two numbers
  • c - sign "c" in the end

See here how to extract such data with re module. Python extract pattern matches

readings = [
  '19-05-2020 17:23:15|25.10c,52.00%rh',
  '19-05-2020 17:23:15|25.10c,53.00%rh',
  '19-05-2020 17:23:15|25.20c,54.00%rh',
  '19-05-2020 17:23:15|25.30c,55.00%rh'
]

import re

temperatures = []

for reading in readings:
  pattern = re.compile('[0-9]{1,}\.[0-9]{2}c')
  temperature = pattern.search(reading).group(0)
  temperature = temperature[:-1] #removes last character which is "c"
  temperature = float(temperature)
  temperatures.append(temperature)

print(temperatures)

Upvotes: 0

Related Questions