user1919035
user1919035

Reputation: 227

Remove Lot of white spaces in python

I'm trying to write custom tokenizer:

print(re.sub(' ',"\n",(re.sub('\\{|\\}|\\[|\\]|\\\\|\\/|\\\"|\\\'|\\,|\\=|\\(|\\)|\\:|\\||\\-|\\*|\\!|\\;|\\<|\\>|\\,|\\?|//@'," ",str))))

Output:

America




Category
States
of
the
United
States




Category
Southern
United
States





Link
FA
mk

Many new lines being inserted. I'm trying to write an optimized code to remove all empty lines with regular expressions without going into each and everydetails. I'm really worried about the performance of the program. I've lines over 100 Billion. So, I'm bit worried about time of execution. Any suggessions?

I'm trying to make output as below:

America
Category
States
of
the
United
States
Category
Southern
United
States
Link
FA
mk

Upvotes: 1

Views: 83

Answers (2)

Geoffrey Warne
Geoffrey Warne

Reputation: 162

re.sub('\n{2,}', '\n', str)

will remove empty lines

Upvotes: 1

Christian Tapia
Christian Tapia

Reputation: 34146

You can use join() and split() methods:

print " ".join(your_string.split())

Output:

America Category States of the United States Category Southern United States Link FA mk

Edit:

To get each word in a different line, use "\n" instead of " ":

print "\n".join(a.split())

Upvotes: 4

Related Questions