frazman
frazman

Reputation: 33273

what is the delimiter in my text

I have a long messy file and my friend tells me that he has tab delimited that file.. But when i do :

  tokens = line.split("\t")

It doesnt splits...

But I dont know maybe I am missing something.. and my friend seems pretty sure that he the file is tab-delimited. and it looks like its tab-delimited as well

sample file

10      AccessibleComputing     0       381202555       2010-08-26T22:38:36Z    OlEnglish       7181920 #F3#    [[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch  #REDIRECT#F0#[[Computer#F0#accessibility]]#F0#{{R#F0#from#F0#CamelCase}}        lo15ponaybcg2sf49sstw9gdjmdetnk ,Computer_accessibility

Is there a way to know that hidden delimiter in python?

Maybe code the string in another format....?

Upvotes: 1

Views: 111

Answers (2)

danodonovan
danodonovan

Reputation: 20373

Could you have got the tabs and spaces muddled or converted? Maybe splitting on both tabs and spaces would help

import re
re.split('\t|    ', line)

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1123500

Just split on whitespace:

line.split()

str.split() with no arguments will split on variable-width whitespace, and remove leading and trailing whitespace as needed. Whitespace is any tab, space, newline or carriage return:

>>> '10      AccessibleComputing     0       381202555       2010-08-26T22:38:36Z    OlEnglish       7181920 #F3#    [[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch  #REDIRECT#F0#[[Computer#F0#accessibility]]#F0#{{R#F0#from#F0#CamelCase}}        lo15ponaybcg2sf49sstw9gdjmdetnk ,Computer_accessibility'.split()
['10', 'AccessibleComputing', '0', '381202555', '2010-08-26T22:38:36Z', 'OlEnglish', '7181920', '#F3#', '[[Help:Reverting|Reverted]]', 'edits', 'by', '[[Special:Contributions/76.28.186.133|76.28.186.133]]', '([[User', 'talk:76.28.186.133|talk]])', 'to', 'last', 'version', 'by', 'Gurch', '#REDIRECT#F0#[[Computer#F0#accessibility]]#F0#{{R#F0#from#F0#CamelCase}}', 'lo15ponaybcg2sf49sstw9gdjmdetnk', ',Computer_accessibility']

Upvotes: 6

Related Questions