Reputation: 43
I'm just starting to learn so sorry about any confusion.
I have 2 files. File A has the list of samples names I'm interested in. And File B has the data from all samples.
File A (no headers)
sample_A
sample_XA
sample_12754
samples_75t
File B
name description etc .....
sample_JA mm 0.01 0.1 1.2 0.018 etc
sample_A mm 0.001 1.2 0.8 1.4 etc
sample_XA hu 0.4 0.021 0.14 2.34 etc
samples_YYYY RN 0.0001 3.435 1.1 0.01 etc
sample_12754 mm 0.1 0.1 0.87 0.54 etc
sample_2248333 hu 0.43 0.01 0.11 2.32 etc
samples_75t mm 0.3 0.02 0.14 2.34 etc
I want to compare file A to file B and output the data from B but only for the sample names listed in A.
I tried this.
#!/usr/bin/env python2
import csv
count = 0
import collections
samples = collections.defaultdict(list)
with open('FILEA.txt') as d:
sites = [l.strip() for l in f if l.strip()]
###This gives me the correct list of samples for file A.
with open('FILEB','r') as inF:
for line in inF:
elements = line.split()
if sites.intersection(elements):
count += 1
print (elements)
## Here I get the names of all samples in file B and only the names.I want the data that is in file B but just for the samples in A.
Then I tried using and intersection.
#!/usr/bin/env python2
import sys
import csv
import collections
samples = collections.defaultdict(list)
with open('FILEA.txt','r') as f:
nsamples = [l.strip() for l in f if l.strip()]
print (nsamples)
with open ('FILEB','r') as inF:
for row in inF:
elements = row.split()
if nsamples.intersection(elements):
print(row[0,:])
Still doesn't work.
What do I have to do to get the output data as follows:
name description etc .....
sample_A mm 0.001 1.2 0.8 1.4 etc
sample_XA hu 0.4 0.021 0.14 2.34 etc
sample_12754 mm 0.1 0.1 0.87 0.54 etc
sample_75t mm 0.3 0.02 0.14 2.34 etc
Any ideas will be very much appreciated. Thanks.
Upvotes: 1
Views: 47
Reputation: 180411
Make a set of the lines from filea
then split each line from fileb
once and see if the first element is in the set of data from filea
:
with open("filea") as f, open("fileb") as f2:
# male set of lines stripping newlines
# so we can compare properly later i.e foo\n != foo
st = set(map(str.rstrip, f)) # itertools.imap python2
for line in f2:
# split once and extract first element to compare
if line.strip() and line.split(None, 1)[0] in st:
print(line.rstrip())
Output:
sample_A mm 0.001 1.2 0.8 1.4 etc
sample_XA hu 0.4 0.021 0.14 2.34 etc
sample_12754 mm 0.1 0.1 0.87 0.54 etc
samples_75t mm 0.3 0.02 0.14 2.34 etc
Upvotes: 3