Reputation: 269
I'd like to read all integers from a file into the one list. All numbers are separated by space (one or more) or end line character (one or more). What is the most efficient and/or elegant way of doing this? I have two solutions, but I don't know if they are good or not.
Checking for digits:
for line in open("foo.txt", "r"):
for i in line.strip().split(' '):
if i.isdigit():
my_list.append(int(i))
Dealing with exceptions:
for line in open("foo.txt", "r"):
for i in line:
try:
my_list.append(int(i))
except ValueError:
pass
Sample data:
1 2 3
4 56
789
9 91 56
10
11
Upvotes: 11
Views: 5469
Reputation: 40763
This was the fastest way I found:
import re
regex = re.compile(r"\D+")
with open("foo.txt", "r") as f:
my_list = list(map(int, regex.split(f.read())))
Though the results could depend on the size of the file.
Upvotes: 0
Reputation: 90979
An efficient way of doing it would be your first method with a small change of using with
statement for opening the file , Example -
with open("foo.txt", "r") as f:
for line in f:
for i in line.split():
if i.isdigit():
my_list.append(int(i))
Timing tests done with comparisons to other methods -
The functions -
def func1():
my_list = []
for line in open("foo.txt", "r"):
for i in line.strip().split(' '):
if i.isdigit():
my_list.append(int(i))
return my_list
def func1_1():
return [int(i) for line in open("foo.txt", "r") for i in line.strip().split(' ') if i.isdigit()]
def func1_3():
my_list = []
with open("foo.txt", "r") as f:
for line in f:
for i in line.split():
if i.isdigit():
my_list.append(int(i))
return my_list
def func2():
my_list = []
for line in open("foo.txt", "r"):
for i in line.split():
try:
my_list.append(int(i))
except ValueError:
pass
return my_list
def func3():
my_list = []
with open("foo.txt","r") as f:
cf = csv.reader(f, delimiter=' ')
for row in cf:
my_list.extend([int(i) for i in row if i.isdigit()])
return my_list
Results of timing tests -
In [25]: timeit func1()
The slowest run took 4.70 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 204 µs per loop
In [26]: timeit func1_1()
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 207 µs per loop
In [27]: timeit func1_3()
The slowest run took 5.46 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 191 µs per loop
In [28]: timeit func2()
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 212 µs per loop
In [34]: timeit func3()
The slowest run took 4.38 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 202 µs per loop
Given the methods that store the data into a list, I believe func1_3()
above is fastest (As shown by the timeit).
But given that , if you are really handling very large files , then you maybe better off using a generator rather than storing the complete list in memory.
UPDATE : As it was being said in the comments that func2()
is faster than func1_3()
(Though on my system it was never faster than func1_3()
even for only integers) , updated the foo.txt
to contain things other than numbers and taking timing tests -
foo.txt
1 2 10 11
asd dd
dds asda
22 44 32 11 23
dd dsa dds
21 12
12
33
45
dds
asdas
dasdasd dasd das d asda sda
Test -
In [13]: %timeit func1_3()
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 210 µs per loop
In [14]: %timeit func2()
1000 loops, best of 3: 279 µs per loop
In [15]: %timeit func1_3()
1000 loops, best of 3: 213 µs per loop
In [16]: %timeit func2()
1000 loops, best of 3: 273 µs per loop
Upvotes: 7
Reputation: 9969
It's pretty easy if you can read the whole file as a string. (ie. it's not too large to do that)
fileStr = open('foo.txt').read().split()
integers = [int(x) for x in fileStr if x.isdigit()]
read()
turns it into a long string, and split
splits apart into a list of strings based on whitespace (ie. Spaces and newlines). So you can combine that with a list comprehension that converts them to integers if they're digits.
As Bakuriu noted, if the file is guaranteed to only have whitespace and numbers, then you don't have to check for isdigit(). Using list(map(int, open('foo.txt').read().split()))
would be enough in that case. That method will raise errors if anything is an invalid integer whereas the other will skip anything that isn't a recognised digit.
Upvotes: 5
Reputation: 7369
Try this:
with open('file.txt') as f:
nums = []
for l in f:
l = l.strip()
nums.extend([int(i) for i in l.split() if i.isdigit() and l])
l.strip()
is required above if newlines('\n') are present, as i.isdigit('6\n')
won't work.
list.extend comes in handy here
The and l
at the end makes sure to discard any empty list result
str.split splits on whitespace by default. And the with block will automatically close the file after the code within is executed. I've also made use of list comprehensions
Upvotes: 3
Reputation: 269
Thank you all. I've mixed some solutions you posted. This seems very good to me:
with open("foo.txt","r") as f:
my_list = [int(i) for line in f for i in line.split() if i.isdigit()]
Upvotes: 4
Reputation: 8335
You could do it like this using list comprehension
my_list = [int(i) for j in open("1.txt","r") for i in j.strip().split(" ") if i.isdigit()]
Or with open() method
:
with open("1.txt","r") as f:
my_list = [int(i) for j in f for i in j.strip().split(" ") if i.isdigit()]
process:
1.First you will be iterating over the line
2.Then you will be iterating over the words and see it they are digit if so we add them to list
edit:
You need to addstrip()
to line because every end of line (except last line) will have new line space ("\n") in them and is you try is.digit("number\n") you will get false
i.e)
>>> "1\n".isdigit()
False
edit2:
Input:
1
qw 2
23 we 32
File data when read:
a=open("1.txt","r")
repr(a.read())
"'1\\nqw 2\\n23 we 32'"
You can see the "\n"
new line right it will affect the process
When I run the function with out strip()
it will not take 1 and 2
as a digit because it consists of new line characters
my_list = [int(i) for j in open("1.txt","r") for i in j.split(" ") if i.isdigit()]
my_list
[23, 32]
From the output it is clear 1 and 2 are missing .This can be avoided if we used strip()
Upvotes: 3
Reputation: 2975
my_list = []
with open('foo.txt') as f:
for line in f:
for s in line.split():
try:
my_list.append(int(s))
except ValueError:
pass
Upvotes: 3
Reputation: 1161
why not use yield
keyword ? the code will be as...
def readInt():
for line in open("foo.txt", "r"):
for i in line.strip().split(' '):
if i.isdigit():
yield int(i)
then you can read
for num in readInt():
list.append(num)
Upvotes: 3