labmat
labmat

Reputation: 193

Read numbers from many text files and average them Python

I have just started python and I have about 6000 .txt files each containing few numbers in a column like:

file1.txt:

2

43

78

file2.txt:

98

12

    and so on

I want to read them and store them in an array and calculate its mean. Mean of (2,43,78,98,12..) i.e. all numbers from all files should give 1 mean When I read and store them, they look like:

['2, 43, 78', '98, 12',..]

... ( I got rid of the '\n') But when I use ave = sum(a)\float(len(a)) I get an error. What am I doing wrong? Is there anything I missed or another way to do this?

Code:

import fnmatch
import os

rootPath = 'D:/Data'
pattern = '*.txt'
all_data = []
for root, dirs, files in os.walk(rootPath):
    for filename in fnmatch.filter(files, pattern):
        #print( filename )
        name = os.path.join(root, filename)
        str = open(name, 'r').read()
        #print str
        all_data.append(str)
a=[item.replace('\n', ' ') for item in all_data]
#print a
for val in a:
    values = map(float, val.split(", "))
    ave = sum(values)/len(values)
    print ave

I get error:

invalid literal for float()

Upvotes: 1

Views: 1257

Answers (4)

Graipher
Graipher

Reputation: 7186

sum("abc") is not defined. Neither is sum("2, 43"). sum works only on numeric types.

You need to split the line first and convert the values to a numeric value (I used float here, because then the sum will be a float, so there is no need to convert the len to a float):

rows = ['2 43 78', '98 12']
total_sum = total_len = 0
for row in rows:
    values = map(float, row.split())
    total_sum += sum(values)
    total_len += len(values)
print total_sum/total_len

For Python 3.x replace the print avg with print(avg) and add a list() around the map, because otherwise len is not defined for it.

This is similar to what @VadimK has in his answer, but avoids list addition and just does integer addition instead.

Upvotes: 3

Rolf of Saxony
Rolf of Saxony

Reputation: 22443

Simple method using glob for linux

import glob
tot_list=[]
for i in glob.glob('*.txt'):        #Return a list of .txt files in current directory
#    print('file:', i)
    with open(i) as f:              #Open file, read lines
        lines = f.readlines()
        for x in lines:             # process each line
            try:
                x=int(x)            #Test for integer value
                tot_list.append(x)  #Include in list
            except:
                pass
print('Total:',sum(tot_list),'No of Items:',len(tot_list))
print('Mean : %.2f' % (sum(tot_list)*1.0/len(tot_list))) #Print floating point result to 2 decimal places

Upvotes: 0

Vadim  K
Vadim K

Reputation: 129

I think it could be better to map numbers after reading file like:

total_list = []
for file in files:
    str_list = file.read().splitlines() # ['1', '2', '3', '4', '5', '6']
    int_list = map(int, str_list) # [1, 2, 3, 4, 5, 6]
    total_list += int_list
ave = sum(total_list) / float(len(total_list))

Upvotes: 1

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48067

Simple approach using list comprehension as:

>>> my_list = ['2, 43, 78', '98, 12']
>>> my_nums = [float(j) for i in my_list for j in i.split(', ')] 
>>> avg = sum(my_nums)/float(len(my_nums))
>>> avg
46.6

Upvotes: 2

Related Questions