Conor McNally
Conor McNally

Reputation: 75

Remove every empty line in file - Python/JSONl

I have some datasets from using a twitter scraper. When I use the scraper, in order to further analyse the data I need the file to contain all the lines one after the other. I collected the data at a certain time during an event, so the data cannot be recollected using new code. I need to write some code that removes all of these blank lines between each tweet.

Here is an example of some of the data in the file.

1: Data Data Data etc
2: 
3: data data data
4: 

I have tried so many different methods to remove these blank lines with no success. My current code that I am trying is the following:

f = open(r"stream_london.jsonl", "r")
text = f.read()
lines = text.splitlines()

for line in lines:
    if line.isspace() == True:
        lines.write(line)

I have had no success. I need the code to rewrite the current file so that all the data is present, with entry 1 on line 1 and entry 2 on line 2, rather than it currently being on lines 1, 3, 5, 7 etc.

Can anybody help me with this? I've managed to do all the twitter scraping with relative ease but now becoming frustrated that I cannot seem to achieve such a simple task to remove blank lines and move all the data upwards to compact it.

Upvotes: 0

Views: 2899

Answers (4)

Conor McNally
Conor McNally

Reputation: 75

Just a heads up for anyone searching for an easy solution!

I was using VScode as my text editor, and rather than write a Python script or any code..

If you use the replace command in your text editor and replace '\n\n' with '\n', this will remove every blank line!

Upvotes: 0

9769953
9769953

Reputation: 12221

Try this:

with open(r"stream_london.jsonl") as fh:
    for line in fh:
        if line.strip():
            print(line)  # or do other stuff with non-blank lines

Upvotes: 0

Theoul
Theoul

Reputation: 65

    for line in lines:
        if not line.strip():
            print(line)

Should work. String.strip() removes extra whitespace from the end and start of a string if no argument is passed. If you pass an argument (has to be string) the characters within the argument will be removed from the end and start instead.

Upvotes: 1

9769953
9769953

Reputation: 12221

If you are 101% sure that every even line should be removed, you can skip checking for an empty line (since, given your comment, it apparently contains more than whitespace), and test for the line number instead:

with open("stream_london.jsonl") as infile, open("stream_london_new.jsonl", "w") as outfile:
    for i, line in enumerate(infile):
        if i % 2:   # counting starts at 0, and `i % 2` is true for odd numbers
            continue
        outfile.write(line)

Upvotes: 2

Related Questions