Reputation: 11
I have file that is read by application in Unix and Windows. However, I am encountering problems when reading in Windows with ^M in the middle of the data. I am only wanting to remove the ^M in the middle of the lines such as field 4 and field 5.
I have tried using perl -pe 's/\cM\cJ?//g'
, but it removes everything into one line which i don't want. I want the data to stay in the same line but remove the extra ones
# Comment^M
# field1_header|field2_header|field3_header|field4_header|field5_header|field6_header^M
#^M
field1|field2|field3|fie^Mld4|fiel^Md5|field6^M
^M
Upvotes: 1
Views: 7769
Reputation: 67890
It sounds like the easiest solution might be to check your filetype before moving between Unix and Windows. dos2unix and unix2dos might be what you really need, instead of a regex.
I'm not sure what character ^M
is supposed to be, but carriage return is \015
or \r
. So, s/\r//g
should suffice. Remember it also removes your last carriage return, if that is something you wish to preserve.
Upvotes: 0
Reputation: 26861
use strict;
use warnings;
my $a = "field1|field2|field3|fie^Mld4|fiel^Md5|field6^M";
$a =~ s/\^M(?!$)//g;
print $a;
Upvotes: 0
Reputation: 107789
To just remove CR in the middle of a line:
perl -pe 's/\r(?!\n)//g'
You can also write this perl -pe 's/\cM(?!\cJ)//g'
. The ?!
construct is a negative look-ahead expression. The pattern matches a CR, but only when it is not followed by a LF.
Of course, if producing a file with unix newlines is acceptable, you can simply strip all CR characters:
perl -pe 'tr/\015//d'
What you wrote, s/\cM\cJ?//g
, strips a CR and the LF after it if there is one, because the LF is part of the matched pattern.
Upvotes: 1