Reputation: 141
I have a file like this:
EgrG_000961100.1 IPR001611
EgrG_000961100.1 IPR032675
EgrG_000961100.1 IPR000742
EgrG_000961100.1 IPR001791
EgrG_000961100.1 IPR001611
EgrG_000989200.1 IPR000668
EgrG_000989200.1 IPR013201
EgrG_000989200.1 IPR025660
EgrG_000989200.1 IPR000668
EgrG_000989200.1 IPR025661
EgrG_000989200.1 IPR000169
EgrG_000704400.1 IPR013780
EgrG_000704400.1 IPR015341
EgrG_000704400.1 IPR011682
EgrG_000704400.1 IPR015341
EgrG_000704400.1 IPR011013
and I would like to write one line per ID (ID = EgrG_*) with next column containing all the IPR for the ID, like this:
EgrG_000961100.1 IPR001611|IPR032675|IPR000742|IPR001791|IPR001611
EgrG_000989200.1 IPR000668|IPR025660|IPR000668|IPR025661|IPR000169
EgrG_000704400.1 IPR013780|IPR015341|IPR011682|IPR015341|IPR011013
I don't know how to this in python. Thanks in advance.
Upvotes: 1
Views: 59
Reputation: 209
f = open("file","r+")
lines = f.readlines()
f.close()
dict = {} #create a dictionary where the key is your ID and the value a list with IPR
for line in lines:
ID,IPR = line.split("/t") #I assume your txt file is TAB seperated
if dict.has_key(ID):
dict[ID] = dict[ID]+[IPR]
else:
dict[ID] = [IPR]
When you have the dictionary just write it to a file the way you want. I think this will work. There are probably better or faster solutions, but I hope it will help.
Upvotes: 1