Reputation: 63984

Converting lines in chunks into tab delimited

I have the following lines in 2 chunks (actually there are ~10K of that). And in this example each chunk contain 3 lines. The chunks are separated by an empty line. So the chunks are like "paragraphs".

xox
91-233
chicago

koko
121-111
alabama

I want to turn it into tab-delimited lines, like so:

xox  91-233  chicago
koko 121-111 alabama

How can I do that?

I tried tr "\n" "\t", but it doesn't do what I want.

Upvotes: 5

Answers (5)

P....

Reputation: 18351

xargs -L3 < filename.log |tr ' ' '\t'
xox 91-233 chicago
koko 121-111 alabama

Upvotes: 3

mklement0

Reputation: 437197

^{This answer offers the following:

* It works with blocks of nonempty lines of any size, separated by any number of empty lines; John1024's helpful answer (which is similar and came first) works with blocks of lines separated by exactly one empty line.

* It explains the awk command used in detail.}

A more idiomatic (POSIX-compliant) awk solution:

awk -v RS= -F '\n' -v OFS='\t' '$1=$1""' file

-v RS= tells awk to operate in paragraph mode: consider each run of nonempty lines a single record; RS is the input record separator.
- Note: The implication is that this solution considers one or more empty lines as separating paragraphs (line blocks); empty means: no line-internal characters at all, not even whitespace.
-F '\n' tells awk to consider each line of an input paragraph its own field (breaks the multiline input record into fields by lines); -F sets FS, the input field separator.
-v OFS='\t' tells awk to separate fields with \t (tab chars.) on output; OFS is the output field separator.
$1=$1"" looks like a no-op, but, due to assigning to field variable $1 (the record's first field), tells awk to rebuild the input record, using OFS as the field separator, thereby effectively replacing the \n separators with \t.
- The trailing "" is to guard against the edge case of the first line in a paragraph evaluating to 0 in a numeric context; appending "" forces treatment as a string, and any nonempty string - even if it contains "0" - is considered true in a Boolean context - see below.
Given that $1 is by definition nonempty and given that assignments in awk pass their value through, the result of assignment $1=$1"" is also a nonempty string; since the assignment is used as a pattern (a condition), and a nonempty string is considered true, and there is no associated action block ({ ... }), the implied action is to print the - rebuilt - input record, which now consists of the input lines separated with tabs, terminated by the default output record separator (ORS), \n.

Upvotes: 4

Shravan Yadav

Reputation: 1317

another version of awk to do this

 awk '{if(NF>0){a=a$1"\t";i++};if(i%3==0&&NF>0){print a;a=""}}' input_file

Upvotes: 2

karakfa

Reputation: 67467

another alternative,

$ sed '/^$/d' file | pr -3ats$'\t'

xox     91-233  chicago
koko    121-111 alabama

remove empty lines with sed and print to 3 columns with tab delimiter. In your real file, this should be the number of lines in blocks.

Note that this will only work if all your blocks are of the same size.

Upvotes: 3

John1024

Reputation: 113824

$ awk -F'\n' '{$1=$1} 1' RS='\n\n' OFS='\t' file
xox     91-233  chicago
koko    121-111 alabama

How it works

Awk divides input into records and it divides each record into fields.

-F'\n'

This tells awk to use a newline as the field separator.
$1=$1

This tells awk to assign the first field to the first field. While this seemingly does nothing, it causes awk to treat the record as changed. As a consequence, the output is printed using our assigned value for ORS, the output record separator.
1

This is awk's cryptic shorthand for print the line.
RS='\n\n'

This tells awk to treat two consecutive newlines as a record separator.
OFS='\t'

This tells awk to use a tab as the field separator on output.

Upvotes: 5

Converting lines in chunks into tab delimited

Answers (5)

How it works

Related Questions