claudiadast
claudiadast

Reputation: 611

Getting rid of "\r" when converting file to a list in python

I have an Excel file that looks like the following:

First_Name  Initials    Last_Name   Places  Email   Tel Fax Joint   Corresponding   Experimental design Data generation Data processing Data analysis   Statistical analysis    Manuscript preparation
Anna    A   Karenina    BioInform_Harvard   [email protected]  8885006000  8885006001  1       Y   Y   Y   Y   Y   Y
Konstantin  D   Levin   Neuro_Harvard   [email protected]  8887006000  8887006001  1               Y   Y   Y   
Alexei  K   Vronsky IGM_Columbia    [email protected]    8889006000  8889006001  2           Y               
Stepan  A   Oblonsky    NIMH    [email protected]   8891006000  8891006001  2       Y                   Y

In my Python code, to open the file i have written code as follows:

with open(filename, 'r') as f:
    for i in f:
        i = i.rstrip().split("\t")
        print(i)

The resulting list looks as follows. How do I get rid of the '\r'? I've tried various methods like replacing "\r" with "", but that messes up the elements of the list that look like 'Y\rKonstantin'.

['First_Name', 'Initials', 'Last_Name', 'Places', 'Email', 'Tel', 'Fax', 'Joint', 'Corresponding', 'Experimental design', 'Data generation', 'Data processing', 'Data analysis', 'Statistical analysis', 'Manuscript preparation\rAnna', 'A', 'Karenina', 'BioInform_Harvard', '[email protected]', '8885006000', '8885006001', '1', '', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y\rKonstantin', 'D', 'Levin', 'Neuro_Harvard', '[email protected]', '8887006000', '8887006001', '1', '', '', '', 'Y', 'Y', 'Y', '\rAlexei', 'K', 'Vronsky', 'IGM_Columbia', '[email protected]', '8889006000', '8889006001', '2', '', '', 'Y', '', '', '', '\rStepan']

I'm able to get rid of newline characters fine, but it's the '\r' I can't get rid of.

Upvotes: 0

Views: 99

Answers (2)

glibdud
glibdud

Reputation: 7840

The key thing to notice is that python only reads one big line with all the \r characters embedded within. Based on that, I'm guessing you're using Python 2.x, which didn't enable universal newlines mode by default. Changing your mode to rU should do what you're expecting:

with open(filename, 'rU') as f:
    for i in f:
        i = i.rstrip().split("\t")
        print(i)

For more information, see the open() documentation.

Upvotes: 1

Sam Mason
Sam Mason

Reputation: 16174

as suggested, the csv module is good for dealing with this sort of data. I'd do something like:

import csv

with open(filename) as fd:
  inp = csv.reader(fd, delimiter='\t')

  header = next(inp)
  print(header)

  for row in inp:
    print(row)

Python has support for magic universal newlines which means it does something sensible with "old-style" Mac line-endings by default. your can then use the csv module with a custom delimiter to parse the tab delimited file

Upvotes: 1

Related Questions