ivanhoifung
ivanhoifung

Reputation: 349

Python reading files in a directory

I have a .csv with 3000 rows of data in 2 columns like this:

uc007ayl.1  ENSMUSG00000041439
uc009mkn.1  ENSMUSG00000031708
uc009mkn.1  ENSMUSG00000035491

In another folder I have a graphs with name like this:

uc007csg.1_nt_counts.txt
uc007gjg.1_nt_counts.txt

You should notice those graphs have a name in the same format of my 1st column

I am trying to use python to identify those rows that have a graph and print the name of 2nd column in a new .txt file

These are the codes I have

import csv
with open("C:/*my dir*/UCSC to Ensembl.csv", "r") as f:
reader = csv.reader(f, delimiter = ',')
    for row in reader:
        print row[0]

But this as far as I can get and I am stuck.

Upvotes: 2

Views: 15534

Answers (5)

phihag
phihag

Reputation: 287745

You're almost there:

import csv
import os.path
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
    reader = csv.reader(f, delimiter = ',')
    for row in reader:
        graph_filename = os.path.join("C:/folder", row[0] + "_nt_counts.txt")
        if os.path.exists(graph_filename):
            print (row[1])

Note that the repeated calls to os.path.exists may slow down the process, especially if the directory lies on a remote filesystem and does not significantly more files than the number of lines in the CSV file. You may want to use os.listdir instead:

import csv
import os

graphs = set(os.listdir("C:/graph folder"))
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
    reader = csv.reader(f, delimiter = ',')
    for row in reader:
        if row[0] + "_nt_counts.txt" in graphs:
            print (row[1])

Upvotes: 3

jfs
jfs

Reputation: 414079

import csv
import os

# get prefixes of all graphs in another directory
suff = '_nt_counts.txt'
graphs = set(fn[:-len(suff)] for fn in os.listdir('another dir') if fn.endswith(suff))

with open(r'c:\path to\file.csv', 'rb') as f:
    # extract 2nd column if the 1st one is a known graph prefix
    names = (row[1] for row in csv.reader(f, delimiter='\t') if row[0] in graphs)
    # write one name per line
    with open('output.txt', 'w') as output_file:
        for name in names:
            print >>output_file, name

Upvotes: 0

Justin Fay
Justin Fay

Reputation: 2606

result = open('result.txt', 'w')
for line in open('C:/*my dir*/UCSC to Ensembl.csv', 'r'):
    line = line.split(',')
    try:
        open('/path/to/dir/' + line[0] + '_nt_counts.txt', 'r')
    except:
        continue
    else:
        result.write(line[1] + '\n')
result.close()

Upvotes: 0

Burhan Khalid
Burhan Khalid

Reputation: 174614

Well, the next step would be to check if the file exists? There are a few ways, but I like the EAFP approach.

try:
   with open(os.path.join(the_dir,row[0])) as f: pass
except IOError:
   print 'Oops no file'

the_dir is the directory where the files are.

Upvotes: 0

vonPetrushev
vonPetrushev

Reputation: 5599

First, try to see if print row[0] really gives the correct file identifier.

Second, concatenate the path to the files with row[0] and check if this full path exists (if the file exists, actually) with os.path.exists(path) (see http://docs.python.org/library/os.path.html#os.path.exists ).

If it exits, you can write the row[1] (the second column) to a new file with f2.write("%s\n" % row[1] (first you have to open f2 for writing of course).

Upvotes: 1

Related Questions