Tiago Minuzzi
Tiago Minuzzi

Reputation: 141

Python - clustering data from columns

I have a file like this:

EgrG_000961100.1    IPR001611
EgrG_000961100.1    IPR032675
EgrG_000961100.1    IPR000742
EgrG_000961100.1    IPR001791
EgrG_000961100.1    IPR001611
EgrG_000989200.1    IPR000668
EgrG_000989200.1    IPR013201
EgrG_000989200.1    IPR025660
EgrG_000989200.1    IPR000668
EgrG_000989200.1    IPR025661
EgrG_000989200.1    IPR000169
EgrG_000704400.1    IPR013780
EgrG_000704400.1    IPR015341
EgrG_000704400.1    IPR011682
EgrG_000704400.1    IPR015341
EgrG_000704400.1    IPR011013

and I would like to write one line per ID (ID = EgrG_*) with next column containing all the IPR for the ID, like this:

EgrG_000961100.1    IPR001611|IPR032675|IPR000742|IPR001791|IPR001611
EgrG_000989200.1    IPR000668|IPR025660|IPR000668|IPR025661|IPR000169
EgrG_000704400.1    IPR013780|IPR015341|IPR011682|IPR015341|IPR011013

I don't know how to this in python. Thanks in advance.

Upvotes: 1

Views: 59

Answers (1)

J. Goedhart
J. Goedhart

Reputation: 209

f =  open("file","r+")
lines = f.readlines() 
f.close()
dict = {} #create a dictionary where the key is your ID and the value a list with IPR
for line in lines:
     ID,IPR = line.split("/t") #I assume your txt file is TAB seperated
     if dict.has_key(ID):
          dict[ID] = dict[ID]+[IPR]
     else:
          dict[ID] = [IPR]

When you have the dictionary just write it to a file the way you want. I think this will work. There are probably better or faster solutions, but I hope it will help.

Upvotes: 1

Related Questions