Pavlos Panteliadis
Pavlos Panteliadis

Reputation: 1565

Read a file in Python and replace certain strings on the go

I want to read a multiple files in Python in order to do some mapping between them.

I'm pretty new at these things, so I got the code from someone else. But now I want to edit it. And I can't fully understand the python macros.

So here's the code

def getDataFromFile(infile):
    '''
    Opens a file, processes it by replacing all the \t\t
    with \t'n/a'\t and returns to the user the header of the file,
    and a list of genes.

    '''
    with open(infile, 'r') as f:
        reader = csv.reader(f, delimiter='\t')                              # Open the file with csv.reader so it has a cleaner look to it.
        header = f.readline()                                               # Store header on a variable
        list = [[x if x else 'n/a' for x in line] for line in reader]   # This is done, so we can have 1 universal input. n/a is for non-existent value!
                                                                            # Most databases, don't insert a special character for non-existent
                                                                            # values, they just \t\t it! So be careful with that!
        # With the above approach, we end up with a list of lists 
        # Every column, will have a value and that will be either the one provided by the file
        # or, the "our" special for non-existent attributes, 'NaN'
        header = header.split() # header should be a list of strings.
        return header, geneList

How can I modify this line list = [[x if x else 'n/a' for x in line] for line in reader] so that, not only it checks for '/t/t' and replacing it with 'n/a' but also looks for other forms of 'non-existent' like 'NA' (used in R).

I know it's a noob question, but I started using Python 2 weeks ago. And I'm still in the learning process.

Upvotes: 0

Views: 132

Answers (1)

Jean-François Fabre
Jean-François Fabre

Reputation: 140305

Just add another test in your listcomp:

list = [[x if (x and x not in ["NA","whatever"]) else 'n/a' for x in line] for line in reader]

Which can be clearer like that with inverted logic and integrating empty string in checklist.

list = [['n/a' if (x in ["", "NA","whatever"]) else x for x in line] for line in reader]

Upvotes: 1

Related Questions