user54
user54

Reputation: 583

Replace multiple newlines with single newlines during reading file

I have the next code which reads from multiple files, parses obtained lines and prints the result:

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
       pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
   print k

But I have problem with multiple new lines in output:

test1


test2

Instead of it I want to obtain the next result without empty lines in output:

 test1
 test2

and so on.

I tried playing with regexp:

pars.append(re.sub('someword=|\,.*|\#.*|^\n$','',a.read()))

But it doesn't work. Also I tried using strip() and rstrip() including replace. It also doesn't work.

Upvotes: 22

Views: 31045

Answers (7)

naren
naren

Reputation: 1

Using regex is the only solution here (apart from using a loop to iterate over the string)

text = re.sub(r'[\n]+', '\n', text)

Upvotes: 0

Mohamed Taher Alrefaie
Mohamed Taher Alrefaie

Reputation: 16233

One liner

re.sub(r'[\r\n][\r\n]{2,}', '\n\n', sourceFileContents)

Upvotes: 1

Quin
Quin

Reputation: 97

Use lookahead regular expression to find all of the double return characters r'\n(?=\n) and replace that with nothing. This will find and replace all of these cases in one pass

import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
       pars.append(re.sub(r'\n(?=\n)','',a.read()))

for k in pars:
   print k

Note: this won't help you if the last character is \n of files[0] and the first character of file[1] is also '\n' but... you can use strip for this and your print will take care of the single space between files

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
       pars.append(re.sub(r'\n(?=\n)','',a.read().strip()))

for k in pars:
   print k

Upvotes: 0

yangsen chen
yangsen chen

Reputation: 29

just a simple one, but may not be efficent.

entire_file = "whatever\nmay\n\n\n\nhappen"

while '\n\n' in entire_file:
    entire_file = entire_file.replace("\n\n", "\n")

print(entire_file)

Upvotes: 1

vincent-lg
vincent-lg

Reputation: 559

Just would like to point out: regexes aren't the best way to handle that. Replacing two empty lines by one in a Python str is quite simple, no need for re:

entire_file = "whatever\nmay\n\nhappen"
entire_file = entire_file.replace("\n\n", "\n")

And voila! Much faster than re and (in my opinion) much easier to read.

Upvotes: -3

Kewl
Kewl

Reputation: 3417

Without changing your code much, one easy way would just be to check if the line is empty before you print it, e.g.:

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
        pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
    if not k.strip() == "":
        print k

*** EDIT Since each element in pars is actually the entire content of the file (not just a line), you need to go through an replace any double end lines, easiest to do with re

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
        pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
    k = re.sub(r"\n+", "\n", k)
    if not k.strip() == "":
        print k

Note that this doesn't take care of the case where a file ends with a newline and the next one begins with one - if that's a case you are worried about you need to either add extra logic to deal with it or change the way you're reading the data in

Upvotes: 2

Kris
Kris

Reputation: 1438

You could use a second regex to replace multiple new lines with a single new line and use strip to get rid of the last new line.

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files/'+str(f), 'r') as a:
        word = re.sub(r'someword=|\,.*|\#.*','', a.read())
        word = re.sub(r'\n+', '\n', word).strip()
        pars.append(word)

for k in pars:
   print k

Upvotes: 26

Related Questions