Hjalte
Hjalte

Reputation: 396

Remove unique values in textfile - if else - Python

I have a text file looking like the following:

A   B   C   D
0   4   3   5
3   3   5   8
8   1   5   7
9   3   7   9

The data is sorted by column C. What I am trying to do, is to write a Python (3.4) that deletes each line, where the value in column C does not equal the value of column C in another line. So I need to pull out all lines, that have one or more matching values of column C. The mockup code below, is my attempt to show what I want to do:

For loop
    if lineXcolumnY == lineX2columnY2
    OR
    if lineX2columnY2 == lineXcolumnY
        print line X
    Else
        Delete line X

On the above example of data, the code would then give me:

A   B   C   D
3   3   5   8
8   1   5   7

I am a complete newbie to Python, so what is confusing me a lot, is how to actually refer to the text file in the script, and how to refer to a specific column. In R I would do Data$C to refer to the column, but in Python?

Upvotes: 0

Views: 291

Answers (1)

Kasravnd
Kasravnd

Reputation: 107347

You can use collections.deque with max length 2 , to keep 2 lines in each iteration then compare their 3th columns :

from collections import deque
q = deque(maxlen=2)
last_q=deque()
with open('newefile.txt','r') as f:

   for line in f:
        q.append(line.strip())
        if len(q)==2 :
            if q[0].split()[2]==q[1].split()[2] :
                last_q.extend(q)
                print q[0],'\n',q[1]

and at last you can write the result on last_q in your file :

with open('newefile.txt','w') as f:
     for line in last_q :
          f.write(line)

result :

3   3   5   8 
8   1   5   7

In this code you need to append the lines to your deque in every iteration :

q.append(line.strip())

So then you need to check the length of q with :

if len(q)==2

Then you can access the 3rd column with splitting the lines and then pick the 3rd element :

q[0].split()[2]

And if there was equal you can put the deque to last_q , with extend :

if q[0].split()[2]==q[1].split()[2] : 
      last_q.extend(q)

Upvotes: 1

Related Questions