Reputation: 193
I have just started python and I have about 6000 .txt files each containing few numbers in a column like:
file1.txt:
2
43
78
file2.txt:
98
12
and so on
I want to read them and store them in an array and calculate its mean. Mean of (2,43,78,98,12..) i.e. all numbers from all files should give 1 mean When I read and store them, they look like:
['2, 43, 78', '98, 12',..]
... ( I got rid of the '\n')
But when I use ave = sum(a)\float(len(a))
I get an error.
What am I doing wrong?
Is there anything I missed or another way to do this?
Code:
import fnmatch
import os
rootPath = 'D:/Data'
pattern = '*.txt'
all_data = []
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
#print( filename )
name = os.path.join(root, filename)
str = open(name, 'r').read()
#print str
all_data.append(str)
a=[item.replace('\n', ' ') for item in all_data]
#print a
for val in a:
values = map(float, val.split(", "))
ave = sum(values)/len(values)
print ave
I get error:
invalid literal for float()
Upvotes: 1
Views: 1257
Reputation: 7186
sum("abc")
is not defined. Neither is sum("2, 43")
. sum
works only on numeric types.
You need to split the line first and convert the values to a numeric value (I used float
here, because then the sum
will be a float
, so there is no need to convert the len
to a float
):
rows = ['2 43 78', '98 12']
total_sum = total_len = 0
for row in rows:
values = map(float, row.split())
total_sum += sum(values)
total_len += len(values)
print total_sum/total_len
For Python 3.x replace the print avg
with print(avg)
and add a list()
around the map
, because otherwise len
is not defined for it.
This is similar to what @VadimK has in his answer, but avoids list addition and just does integer addition instead.
Upvotes: 3
Reputation: 22443
Simple method using glob
for linux
import glob
tot_list=[]
for i in glob.glob('*.txt'): #Return a list of .txt files in current directory
# print('file:', i)
with open(i) as f: #Open file, read lines
lines = f.readlines()
for x in lines: # process each line
try:
x=int(x) #Test for integer value
tot_list.append(x) #Include in list
except:
pass
print('Total:',sum(tot_list),'No of Items:',len(tot_list))
print('Mean : %.2f' % (sum(tot_list)*1.0/len(tot_list))) #Print floating point result to 2 decimal places
Upvotes: 0
Reputation: 129
I think it could be better to map numbers after reading file like:
total_list = []
for file in files:
str_list = file.read().splitlines() # ['1', '2', '3', '4', '5', '6']
int_list = map(int, str_list) # [1, 2, 3, 4, 5, 6]
total_list += int_list
ave = sum(total_list) / float(len(total_list))
Upvotes: 1
Reputation: 48067
Simple approach using list comprehension as:
>>> my_list = ['2, 43, 78', '98, 12']
>>> my_nums = [float(j) for i in my_list for j in i.split(', ')]
>>> avg = sum(my_nums)/float(len(my_nums))
>>> avg
46.6
Upvotes: 2