Reputation: 584
I have a train_file.txt which has 3 columns on each row.
For example;
1 10 1
1 12 1
2 64 2
6 17 1
...
I am reading this txt file with
train_data = open("train_file.txt", 'r').readlines()
Then I am trying to get each value with for loop
for eachline in train_data:
uid, lid, x = eachline.strip().split()
Question: Train data is a huge file that's why I want to just get the first 1000 rows.
I was trying to execute the following code but I am getting an error ('list' object cannot be interpreted as an integer)
for eachline in range(train_data,1000)
uid, lid, x = eachline.strip().split()
Upvotes: 2
Views: 170
Reputation: 1361
I would recommend using the csv
built in library since the data is csv-like (or the pandas one if you're using it), and using with
. So something like this:
import csv
from itertools import islice
with open('./test.csv', 'r') as input_file:
csv_reader = csv.reader(input_file, delimiter=' ')
rows = list(islice(csv_reader, 1000))
# Use rows
print(rows)
You don't need it right now but it will make escaped characters or multiline entries way easier to parse. Also, if there are headers you can use csv.DictReader
to include them.
Regarding your original code:
readlines()
will read all lines at that point so doing any filtering after won't make a difference.for
loop should be:for eachline in traindata[:1000]:
...
Upvotes: 1
Reputation: 14233
train_data
is a list, use slicing:
for eachline in train_data[:1000]:
As the file is "huge" in your words a better approach is to read just first 1000 rows (readlines()
will read the whole file in memory)
with open("train_file.txt", 'r'):
train_data = []
for idx, line in enumerate(f, start=1):
train_data.append(line.strip.split())
if idx == 1000:
break
Note that data will be str
, not int
. You probably want to convert them to int
.
Upvotes: 2
Reputation: 73450
It is not necessary to read the entire file at all. You could use enumerate
on the file directly and break early or use itertools.islice
:
from itertools import islice
train_data = list(islice(open("train_file.txt", 'r'), 1000))
You can also keep using the same file handle to read more data later:
f = open("train_file.txt", 'r')
train_data = list(islice(f, 1000)) # reads first 1000
test_data = list(islice(f, 100)) # reads next 100
Upvotes: 6
Reputation: 71570
Maybe try changing this line:
train_data = open("train_file.txt", 'r').readlines()
To:
train_data = open("train_file.txt", 'r').readlines()[:1000]
Upvotes: 2
Reputation: 5746
You could use enumerate and a break:
for k, line in enumerate(lines):
if k > 1000:
break # exit the loop
# do stuff on the line
Upvotes: 1