VacuuM
VacuuM

Reputation: 111

How to handle large files in python?

I am new in python. I have asked another question How to arrange three lists in such a way that the sum of corresponding elements if greater then appear first? Now the problem is following:

I am working with a large text file, in which there are 419040 rows and 6 columns containing floats. Among them I am taking first 3 columns to generate those three lists. So the lists I am actually working with has 419040 entries in each. While I was running the python code to extract the three columns into three lists the python shell was not responding, I suspected the large number of entries for this, I used this code:

file=open("file_location","r")
a=[]
b=[]
c=[]
for lines in file:
    x=lines.split(" ")
    a.append(float(x[0]))
    b.append(float(x[1]))
    c.append(float(x[2]))

Note: for small file this code was running perfectly. To avoid this problem I am using the following code:

import numpy as np
a = []
b = []
c = []
a,b,c = np.genfromtxt('file_location',usecols = [0,1,2], unpack=True)

So when I am running the code given in answers to my previous question the same problem is happening. So what will be the corresponding code using numpy? Or, any other solutions?

Upvotes: 0

Views: 150

Answers (1)

Praveen
Praveen

Reputation: 7222

If you're going to use numpy, then I suggest using ndarrays, rather than lists. You can use loadtxt since you don't have to handle missing data. I assume it'll be faster.

a = np.loadtxt('file.txt', usecols=(0, 1, 2))

a is now a two-dimensional array, stored as an np.ndarray datatype. It should look like:

>>> a
array([[  1,  20, 400],
       [  5,  30, 500],
       [  3,  50, 100],
       [  2,  40, 300],
       [  4,  10, 200]])

However, you now need to re-do what you did in the previous question, but using numpy arrays rather than lists. This can be easily achieved like so:

>>> b = a.sum(axis=1)
>>> b
Out[21]: array([535, 421, 342, 214, 153])
>>> i = np.argsort(b)[::-1]
>>> i
Out[26]: array([0, 1, 2, 3, 4])
>>> a[i, :]
Out[27]: 
array([[  5,  30, 500],
       [  1,  20, 400],
       [  2,  40, 300],
       [  4,  10, 200],
       [  3,  50, 100]])

The steps involved are described in a little greater detail here.

Upvotes: 1

Related Questions