Reputation: 396
I have a text file looking like the following:
A B C D
0 4 3 5
3 3 5 8
8 1 5 7
9 3 7 9
The data is sorted by column C. What I am trying to do, is to write a Python (3.4) that deletes each line, where the value in column C does not equal the value of column C in another line. So I need to pull out all lines, that have one or more matching values of column C. The mockup code below, is my attempt to show what I want to do:
For loop
if lineXcolumnY == lineX2columnY2
OR
if lineX2columnY2 == lineXcolumnY
print line X
Else
Delete line X
On the above example of data, the code would then give me:
A B C D
3 3 5 8
8 1 5 7
I am a complete newbie to Python, so what is confusing me a lot, is how to actually refer to the text file in the script, and how to refer to a specific column. In R I would do Data$C to refer to the column, but in Python?
Upvotes: 0
Views: 291
Reputation: 107347
You can use collections.deque
with max length 2 , to keep 2 lines in each iteration then compare their 3th columns :
from collections import deque
q = deque(maxlen=2)
last_q=deque()
with open('newefile.txt','r') as f:
for line in f:
q.append(line.strip())
if len(q)==2 :
if q[0].split()[2]==q[1].split()[2] :
last_q.extend(q)
print q[0],'\n',q[1]
and at last you can write the result on last_q
in your file :
with open('newefile.txt','w') as f:
for line in last_q :
f.write(line)
result :
3 3 5 8
8 1 5 7
In this code you need to append the lines to your deque
in every iteration :
q.append(line.strip())
So then you need to check the length of q
with :
if len(q)==2
Then you can access the 3rd column with splitting the lines and then pick the 3rd element :
q[0].split()[2]
And if there was equal you can put the deque
to last_q
, with extend
:
if q[0].split()[2]==q[1].split()[2] :
last_q.extend(q)
Upvotes: 1