Reputation:
I have a huge text file which have a data set like this
EOG61ZHH8 ENSRNOG00000004762 627
EOG61ZHH8 ENSRNOG00000004762 627
EOG61ZHH9 ENSG00000249709 1075
EOG61ZHH9 ENSG00000249709 230
EOG61ZHH9 ENSG00000249709 87
EOG61ZHHB ENSG00000134030 2347
EOG61ZHHB ENSG00000134030 3658
EOG61ZHHB ENSRNOG00000018342 241
EOG61ZHHB ENSRNOG00000018342 241
EOG61ZHHC ENSBTAG00000006084 1159
EOG61ZHHC ENSG00000158828 820
EOG61ZHHC ENSMMUG00000000126 631
and i want to convert or split it like this
EOG61ZHH8.txt
ENSRNOG00000004762 627
ENSRNOG00000004762 627
EOG61ZHH9.txt
ENSG00000249709 1075
ENSG00000249709 230
ENSG00000249709 87
and so on. I have no clue where to start getting new txt file from the text file above , i have done this thing before but that entries have '[' sign before entry start , now i have many files but not having any special sign to convert them This is the code which i had done in python
with open("entry.txt") as f:
for line in f:
if line[0] == "[":
if out: out.close()
out = open(line.split()[1] + ".txt", "w")
else: out.write(line)'
I am using it in windows , so i knw about linux awk command , so kindly need no information about linux
Upvotes: 0
Views: 214
Reputation:
With regular expressions;
import re
string = ' EOG61ZHH8 ENSRNOG00000004762 627 EOG61ZHH8 ENSRNOG00000004762 627 EOG61ZHH9 ENSG00000249709 1075 EOG61ZHH9 ENSG00000249709 230 EOG61ZHH9 ENSG00000249709 87 EOG61ZHHB ENSG00000134030 2347 EOG61ZHHB ENSG00000134030 3658 EOG61ZHHB ENSRNOG00000018342 241 EOG61ZHHB ENSRNOG00000018342 241 EOG61ZHHC ENSBTAG00000006084 1159 EOG61ZHHC ENSG00000158828 820 EOG61ZHHC ENSMMUG00000000126 631'
result = re.findall('\s+(.*?)\s+(.*?)\s+(\d+)', string, re.S)
buffer = {}
for i in result:
if not i[0] in buffer:
buffer[i[0]] = ''
buffer[i[0]] = buffer[i[0]] + i[1] + ' ' + i[2] + '\n'
for i in buffer.iteritems():
print i
filename = i[0] + '.txt'
content = i[1] # you could remove the unneeded "\n" here with substring if wanted
# CODE: Create the file with "filename"
# CODE: Write "content" to the file
Upvotes: 0
Reputation: 78590
You need only a few adjustments to your script:
out = None
oldfile = None
with open("entry.txt") as f:
for line in f:
newfile = l.split("\t")[0]
if newfile != oldfile:
if out: out.close()
out = open(newfile + ".txt", "w")
oldfile = newfile
out.write("\t".join(line.split("\t")[1:]))
Upvotes: 1