drorhun
drorhun

Reputation: 584

For Loop over a list in Python

I have a train_file.txt which has 3 columns on each row.

For example;

1 10 1

1 12 1

2 64 2

6 17 1

...

I am reading this txt file with

train_data = open("train_file.txt", 'r').readlines()

Then I am trying to get each value with for loop

for eachline in train_data:
    uid, lid, x = eachline.strip().split()

Question: Train data is a huge file that's why I want to just get the first 1000 rows.

I was trying to execute the following code but I am getting an error ('list' object cannot be interpreted as an integer)

for eachline in range(train_data,1000)
        uid, lid, x = eachline.strip().split()

Upvotes: 2

Views: 170

Answers (5)

Hugh
Hugh

Reputation: 1361

I would recommend using the csv built in library since the data is csv-like (or the pandas one if you're using it), and using with. So something like this:

import csv
from itertools import islice

with open('./test.csv', 'r') as input_file:
  csv_reader = csv.reader(input_file, delimiter=' ')
  rows = list(islice(csv_reader, 1000))

# Use rows
print(rows)

You don't need it right now but it will make escaped characters or multiline entries way easier to parse. Also, if there are headers you can use csv.DictReader to include them.

Regarding your original code:

  • The call the readlines() will read all lines at that point so doing any filtering after won't make a difference.
  • If you did read it that way, to get the first 1000 lines your for loop should be:
for eachline in traindata[:1000]:
  ...

Upvotes: 1

buran
buran

Reputation: 14233

train_data is a list, use slicing: for eachline in train_data[:1000]:

As the file is "huge" in your words a better approach is to read just first 1000 rows (readlines() will read the whole file in memory)

with open("train_file.txt", 'r'):
    train_data = []
    for idx, line in enumerate(f, start=1):
        train_data.append(line.strip.split())
        if idx == 1000:
            break

Note that data will be str, not int. You probably want to convert them to int.

Upvotes: 2

user2390182
user2390182

Reputation: 73450

It is not necessary to read the entire file at all. You could use enumerate on the file directly and break early or use itertools.islice:

from itertools import islice

train_data = list(islice(open("train_file.txt", 'r'), 1000))

You can also keep using the same file handle to read more data later:

f = open("train_file.txt", 'r')
train_data = list(islice(f, 1000)) # reads first 1000
test_data = list(islice(f, 100))   # reads next 100

Upvotes: 6

U13-Forward
U13-Forward

Reputation: 71570

Maybe try changing this line:

train_data = open("train_file.txt", 'r').readlines()

To:

train_data = open("train_file.txt", 'r').readlines()[:1000]

Upvotes: 2

Mathieu
Mathieu

Reputation: 5746

You could use enumerate and a break:

for k, line in enumerate(lines):
    if k > 1000: 
        break # exit the loop

    # do stuff on the line

Upvotes: 1

Related Questions