Reputation: 46919
I am using the following to read a tab separated file. There are three columns in the file but the first column is being ignored when I print the column header only. How can I include the first column, too?
f = open("/tmp/data.txt")
for l in f.readlines():
print l.strip().split("\t")
break
f.close()
Output:
['session_id\t', '\tevent_id_concat']
The first column name is id
where it is not printed in the above array.
print l
yields the following:
'id\tsession_id\tevent_id_concat\r\n'
Output:
['id\t', '\tevent_id_concat']
Upvotes: 9
Views: 130813
Reputation: 4160
The question is a simple old Python 2 scenario so I hope the following might be a more up-to-date and complete alternative to the others here.
The csv
module was being used to read a CSV file generated from an Excel document, but when that changed to a tab delimited file from a similar source I couldn't see why the csv
module was necessary.
def read_rows(filename: str) -> list[dict[str, str]]:
"""Read TAB delimited file with header row and return rows."""
with open(filename, newline="", encoding="utf-8") as tabfile:
fieldnames = [field.strip() for field in next(tabfile).split("\t")]
return [
dict(zip(fieldnames, (field.strip() for field in line.split("\t"))))
for line in tabfile.readlines()
]
rows = read_rows("/home/user/in.txt")
# rows is now a list of dict keyed on the field names from the first row
I'm interested why anyone would import the csv
module just for this task.
To clarify the pre-conditions where this approach is reliable, if the data has the following two characteristics, which is a common general scenario, then the simple "dict zip split slurp" above should work:
Given these two pre-conditions, I can't see any reason not to just slurp up the file like this and save an import of the csv
module.
Using CSV files, especially MS Excel CSV files, there are number of gotchas and special cases where it's sensible to use csv
module for protection. But in general usage tab characters are rare in content, especially web content where the tab key is used to change fields. It's quite usual to get a scenario where the pre-conditions mentioned above are guaranteed and it seems a waste of effort, and extra lines of code, to bother using csv
when a reliable tab character delimiter is in use.
Refer to the Python open
documentation to understand the keyword arguments to the open
call.
Upvotes: 0
Reputation: 947
I would suggest to use the csv module. It is easy to use and fits best if you want to read in table like structures stored in a CSV like format (tab/space/something else delimited).
The module documentation gives good examples where the simplest usage is stated to be:
import csv
with open('/tmp/data.txt', 'r') as f:
reader = csv.reader(f)
for row in reader:
print row
Every row is a list which is very usefull if you want to do index based manipulations.
If you want to change the delimiter there is a keyword for this but I am often fine with the predefined dialects which can also be defined via a keyword.
import csv
with open('/tmp/data.txt', 'r') as f:
reader = csv.reader(f, dialect='excel', delimiter='\t')
for row in reader:
print row
I am not sure if this will fix your problems but the use of elaborated modules will ensure you that something is wrong with your file and not your code if the error will remain.
Upvotes: 19
Reputation: 40963
It should work but it is better to use 'with':
with open('/tmp/data.txt') as f:
for l in f:
print l.strip().split("\t")
if it doesn't then probably your file doesn't have the required format.
Upvotes: 8