Reputation: 75
I have some datasets from using a twitter scraper. When I use the scraper, in order to further analyse the data I need the file to contain all the lines one after the other. I collected the data at a certain time during an event, so the data cannot be recollected using new code. I need to write some code that removes all of these blank lines between each tweet.
Here is an example of some of the data in the file.
1: Data Data Data etc
2:
3: data data data
4:
I have tried so many different methods to remove these blank lines with no success. My current code that I am trying is the following:
f = open(r"stream_london.jsonl", "r")
text = f.read()
lines = text.splitlines()
for line in lines:
if line.isspace() == True:
lines.write(line)
I have had no success. I need the code to rewrite the current file so that all the data is present, with entry 1 on line 1 and entry 2 on line 2, rather than it currently being on lines 1, 3, 5, 7 etc.
Can anybody help me with this? I've managed to do all the twitter scraping with relative ease but now becoming frustrated that I cannot seem to achieve such a simple task to remove blank lines and move all the data upwards to compact it.
Upvotes: 0
Views: 2899
Reputation: 75
Just a heads up for anyone searching for an easy solution!
I was using VScode as my text editor, and rather than write a Python script or any code..
If you use the replace command in your text editor and replace '\n\n' with '\n', this will remove every blank line!
Upvotes: 0
Reputation: 12221
Try this:
with open(r"stream_london.jsonl") as fh:
for line in fh:
if line.strip():
print(line) # or do other stuff with non-blank lines
Upvotes: 0
Reputation: 65
for line in lines:
if not line.strip():
print(line)
Should work. String.strip()
removes extra whitespace from the end and start of a string if no argument is passed. If you pass an argument (has to be string) the characters within the argument will be removed from the end and start instead.
Upvotes: 1
Reputation: 12221
If you are 101% sure that every even line should be removed, you can skip checking for an empty line (since, given your comment, it apparently contains more than whitespace), and test for the line number instead:
with open("stream_london.jsonl") as infile, open("stream_london_new.jsonl", "w") as outfile:
for i, line in enumerate(infile):
if i % 2: # counting starts at 0, and `i % 2` is true for odd numbers
continue
outfile.write(line)
Upvotes: 2