Reputation: 3
I am new to python and I am currently learning. I got text file with variable number of spaces in between words per line: I am trying to read it as follows:
import re
...: results = []
...: with open ("../../103.Immune_gene_families/Immune_genes/Human/human_immunegene.hits") as file:
...: for line in file:
...: if not line.startswith("#"):
...: line = re.sub("\s\s+" , " ", line)
...: #print(line)
...: ens_id = line.split(" ")[1]
...: print(ens_id)
...:
But I got the following error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-469f5598d359> in <module>
6 line = re.sub("\s\s+" , " ", line)
7 #print(line)
----> 8 ens_id = line.split(" ")[1]
9 print(ens_id)
10
IndexError: list index out of range
Example lines I get with
print(line)
['ENSG00000128016', '115', '138', '107', '147', 'TF106503', '9', '32', '5.9', '8.3', '0', 'No_clan', '']
['ENSG00000128016', '135', '169', '130', '172', 'TF317698', '454', '488', '18.0', '0.00073', '0', 'No_clan', '']
['ENSG00000128016', '137', '175', '134', '196', 'TF318914', '95', '132', '21.9', '8e-05', '0', 'No_clan', '']
['ENSG00000128016', '137', '167', '130', '173', 'TF326635', '1096', '1127', '5.7', '3.3', '0', 'No_clan', '']
['ENSG00000128016', '138', '170', '133', '173', 'TF329017', '881', '912', '5.3', '4.3', '0', 'No_clan', '']
['ENSG00000128016', '139', '166', '129', '173', 'TF105541', '764', '791', '9.3', '0.38', '0', 'No_clan', '']
['ENSG00000128016', '139', '166', '132', '172', 'TF105970', '278', '305', '8.4', '0.6', '0', 'No_clan', '']
['ENSG00000128016', '140', '170', '131', '174', 'TF314946', '110', '140', '4.5', '6.3', '0', 'No_clan', '']
['ENSG00000128016', '142', '167', '134', '184', 'TF329287', '9', '33', '6.8', '2.3', '0', 'No_clan', '']
If you could help me on this regard, much appreciated.
Thank you, AK
Upvotes: 0
Views: 591
Reputation: 27577
You get the index error because there is less than 2 elements in line.split(" ")
,
also meaning there was less than 2 spaces in line
. Try line.split(" ")[0]
instead:
import re
results = []
with open ("../../103.Immune_gene_families/Immune_genes/Human/human_immunegene.hits") as file:
for line in file:
if not line.startswith("#"):
line = re.sub("\s\s+" , " ", line)
#print(line)
ens_id = line.split(" ")[0]
print(ens_id)
Upvotes: 0
Reputation: 2645
Welcome to SO!
If you run
string = 'abc'
print(string.split(' '))
you will see that the result is
['abc']
If you tried to string.split(' ')[1]
, you would generate an IndexError
.
So what is happening is that, somewhere, you likely don't have the character that you are splitting on.
Upvotes: 1