Reputation: 611
I have an Excel
file that looks like the following:
First_Name Initials Last_Name Places Email Tel Fax Joint Corresponding Experimental design Data generation Data processing Data analysis Statistical analysis Manuscript preparation
Anna A Karenina BioInform_Harvard [email protected] 8885006000 8885006001 1 Y Y Y Y Y Y
Konstantin D Levin Neuro_Harvard [email protected] 8887006000 8887006001 1 Y Y Y
Alexei K Vronsky IGM_Columbia [email protected] 8889006000 8889006001 2 Y
Stepan A Oblonsky NIMH [email protected] 8891006000 8891006001 2 Y Y
In my Python
code, to open the file i have written code as follows:
with open(filename, 'r') as f:
for i in f:
i = i.rstrip().split("\t")
print(i)
The resulting list looks as follows. How do I get rid of the '\r'
? I've tried various methods like replacing "\r" with "", but that messes up the elements of the list that look like 'Y\rKonstantin'
.
['First_Name', 'Initials', 'Last_Name', 'Places', 'Email', 'Tel', 'Fax', 'Joint', 'Corresponding', 'Experimental design', 'Data generation', 'Data processing', 'Data analysis', 'Statistical analysis', 'Manuscript preparation\rAnna', 'A', 'Karenina', 'BioInform_Harvard', '[email protected]', '8885006000', '8885006001', '1', '', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y\rKonstantin', 'D', 'Levin', 'Neuro_Harvard', '[email protected]', '8887006000', '8887006001', '1', '', '', '', 'Y', 'Y', 'Y', '\rAlexei', 'K', 'Vronsky', 'IGM_Columbia', '[email protected]', '8889006000', '8889006001', '2', '', '', 'Y', '', '', '', '\rStepan']
I'm able to get rid of newline characters fine, but it's the '\r'
I can't get rid of.
Upvotes: 0
Views: 99
Reputation: 7840
The key thing to notice is that python only reads one big line with all the \r
characters embedded within. Based on that, I'm guessing you're using Python 2.x, which didn't enable universal newlines mode by default. Changing your mode to rU
should do what you're expecting:
with open(filename, 'rU') as f:
for i in f:
i = i.rstrip().split("\t")
print(i)
For more information, see the open()
documentation.
Upvotes: 1
Reputation: 16174
as suggested, the csv
module is good for dealing with this sort of data. I'd do something like:
import csv
with open(filename) as fd:
inp = csv.reader(fd, delimiter='\t')
header = next(inp)
print(header)
for row in inp:
print(row)
Python has support for magic universal newlines which means it does something sensible with "old-style" Mac line-endings by default. your can then use the csv
module with a custom delimiter to parse the tab delimited file
Upvotes: 1