user3893064
user3893064

Reputation: 1

concatenate pairs of lines without ^M appearing

My file (temp.txt) looks like this:

-2011-10-07 11:30:01
00 ///// ///// ///// 00000C00
-2011-10-07 11:30:17
00 ///// ///// ///// 00000C00
-2011-10-07 11:30:32
00 ///// ///// ///// 00000C00
-2011-10-07 11:30:46
00 ///// ///// ///// 00000C00

I want to concatenate each pair of lines so it looks like this:

-2011-10-07 11:30:01 00 ///// ///// ///// 00000C00
-2011-10-07 11:30:17 00 ///// ///// ///// 00000C00
-2011-10-07 11:30:32 00 ///// ///// ///// 00000C00
-2011-10-07 11:30:46 00 ///// ///// ///// 00000C00

However every method I've tried (sed, awk, paste) inserts ^M between the pairs like this:

-2011-10-07 11:30:01^M 00 ///// ///// ///// 00000C00
-2011-10-07 11:30:17^M 00 ///// ///// ///// 00000C00
-2011-10-07 11:30:32^M 00 ///// ///// ///// 00000C00
-2011-10-07 11:30:46^M 00 ///// ///// ///// 00000C00

In vi the ^M appears in blue and can be removed manually but not by pattern matching. It comes up with "pattern not found" error. sed and awk haven't worked either. When opened in gedit or exported to a spreadsheet the carriage returns mean that it appears as in the first file. Since my file is much larger than the segment here, and I have 6 months of daily files to analyse, manual removal is not an option. Please help!

Upvotes: 0

Views: 69

Answers (3)

Ed Morton
Ed Morton

Reputation: 204164

As others have pointed out, the problem is not the tools you are using to process your input file, it's the tool that generated your input file.

tr -d '\r' < file | awk '{ORS=(NR%2?FS:RS)}1'

or with GNU awk for multi-char RS:

awk -v RS='\r\n' '{ORS=(NR%2?FS:"\n")}1' file

Upvotes: 0

konsolebox
konsolebox

Reputation: 75558

Using sed. This works with UNIX and DOS format input so no need to use dos2unix on file.

sed 'N; s|[\r\n]\+| |' file

Output:

-2011-10-07 11:30:01 00 ///// ///// ///// 00000C00
-2011-10-07 11:30:17 00 ///// ///// ///// 00000C00
-2011-10-07 11:30:32 00 ///// ///// ///// 00000C00
-2011-10-07 11:30:46 00 ///// ///// ///// 00000C00

Upvotes: 0

that other guy
that other guy

Reputation: 123570

The ^M form is called "caret notation", and represents a carriage return. Your files are using DOS end of line characters. Convert them to Unix format.

You can do this by run dos2unix on your input files, or piping them through tr -d '\r'.

In both vi and sed, you could have used s/\r//g to replace them automatically.

Upvotes: 1

Related Questions