Reputation: 467
I have a text/TSV file, the end of which looks something like this:
chrY 57000000 57099999 0
chrY 57100000 57199999 0
chrY 57200000 57227415 0
[blank line]
The blank line is recognized as a line in Notepad (Ln 30895, Col 1
) and when I press backspace it goes to the second-to-last line (where I want the file to end). However, it's not being picked up by any of the commonly listed commands to remove last lines, these only delete the penultimate line, producing an output like this:
chrY 57000000 57099999 0
chrY 57100000 57199999 0
[blank line]
rather than:
chrY 57000000 57099999 0
chrY 57100000 57199999 0
chrY 57200000 57227415 0
what I've tried so far:
head -n -1 in.txt > out.txt
sed '$d' in.txt > out.txt
.grep -v -e '^$' in.txt > out.txt
It is also worth noting this last line has no whitespace characters (e.g. spaces or tabs)
As always, any help is really appreciated!
Update
a Hexdump produces the following output:
000d0690: 0935 3730 3939 3939 3909 300a 6368 7259 .57099999.0.chrY
000d06a0: 0935 3731 3030 3030 3009 3537 3139 3939 .57100000.571999
000d06b0: 3939 0930 0a63 6872 5909 3537 3230 3030 99.0.chrY.572000
000d06c0: 3030 0935 3732 3237 3431 3509 300a 00.57227415.0.
From my understanding, I need to delete the newline character from these files in order to run them through another tool, is there any way that the newline character can be deleted after the creation of the text file.
Upvotes: 0
Views: 193
Reputation: 189487
On POSIX platforms, text files are required to have a newline at the end of every line, including the last.
If you need to trim the last newline when passing the file as input to a tool which requires the last one to be broken, I would remove it only in the stream you pass to that tool, rather than wreck the integrity of the physical file.
brokentool -input <(awk 'FNR>1 { printf "\n" } { printf "%s", $0 }' in.txt)
The syntax <( command )
is called a process substitution. It looks to the caller like a file name (something like /dev/fd/63
) but it is a temporary filesystem entry which produces the output from command
when you read from it.
This syntax is specific to the Bash shell.
The logic of the Awk script should be reasonably straightforward; we print a newline if we are not currently processing the first line (so as to add a newline to the previous line we processed, when there was a previous line) and then print the current line without a newline. The net effect is that we "borrow" the newline from the next line and put it back except when the line is the last in the file and so there's nothing to borrow from.
Upvotes: 1