Reputation: 33273
I have a long messy file and my friend tells me that he has tab delimited that file.. But when i do :
tokens = line.split("\t")
It doesnt splits...
But I dont know maybe I am missing something.. and my friend seems pretty sure that he the file is tab-delimited. and it looks like its tab-delimited as well
sample file
10 AccessibleComputing 0 381202555 2010-08-26T22:38:36Z OlEnglish 7181920 #F3# [[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch #REDIRECT#F0#[[Computer#F0#accessibility]]#F0#{{R#F0#from#F0#CamelCase}} lo15ponaybcg2sf49sstw9gdjmdetnk ,Computer_accessibility
Is there a way to know that hidden delimiter in python?
Maybe code the string in another format....?
Upvotes: 1
Views: 111
Reputation: 20373
Could you have got the tabs and spaces muddled or converted? Maybe splitting on both tabs and spaces would help
import re
re.split('\t| ', line)
Upvotes: 1
Reputation: 1123500
Just split on whitespace:
line.split()
str.split()
with no arguments will split on variable-width whitespace, and remove leading and trailing whitespace as needed. Whitespace is any tab, space, newline or carriage return:
>>> '10 AccessibleComputing 0 381202555 2010-08-26T22:38:36Z OlEnglish 7181920 #F3# [[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch #REDIRECT#F0#[[Computer#F0#accessibility]]#F0#{{R#F0#from#F0#CamelCase}} lo15ponaybcg2sf49sstw9gdjmdetnk ,Computer_accessibility'.split()
['10', 'AccessibleComputing', '0', '381202555', '2010-08-26T22:38:36Z', 'OlEnglish', '7181920', '#F3#', '[[Help:Reverting|Reverted]]', 'edits', 'by', '[[Special:Contributions/76.28.186.133|76.28.186.133]]', '([[User', 'talk:76.28.186.133|talk]])', 'to', 'last', 'version', 'by', 'Gurch', '#REDIRECT#F0#[[Computer#F0#accessibility]]#F0#{{R#F0#from#F0#CamelCase}}', 'lo15ponaybcg2sf49sstw9gdjmdetnk', ',Computer_accessibility']
Upvotes: 6