shijie xu
shijie xu

Reputation: 2097

Bash command to merge two rows data

I have a three-column data file, and I want to do some transformation on the data for plot using bash. Notice it is not always withop first. In sometimes, the noop row can be first The sample data is:

printff withop     1
printff noop     0
partial_sums withop     1
partial_sums noop     1
fasta noop     1
fasta withop     1
word_anagrams withop     2
word_anagrams noop     2
list noop     0
list withop     8
gc_mb withop     1
gc_mb noop     1
simple_connect withop     0
simple_connect noop     0
binary_trees noop     2
binary_trees withop     2
cal noop     3
cal withop     6

The transformation I want is to merge every two rows with same value of first column. The new format is still three columns, and the second column is withop and the third is noop. For example, the new data is:

printff  1   0
partial_sums   1 1
....
list    8   0
...

Upvotes: 2

Views: 364

Answers (2)

mklement0
mklement0

Reputation: 437953

If you can rely on related lines coming in pairs, here's a single-pass awk solution:

awk '{
  op1=$2; val1=$3
  getline
  val2=$3
  print $1 " " (op1 == "withop" ? val1 " " val2 : val2 " " val1)
}' file
  • op1=$2; val1=$3 reads the operation field ($2, the second whitespace-separated field) into var. op1, and the value field ($3, the third field) into var. val1.
  • getline reads the next line from the input file, which causes its fields to be reflected in $1, ...

    • While use of getline is fine in this particular case - the lines can assumed to be paired - it has many pitfalls, and its use is rarely the right choice - see http://awk.info/?tip/getline
  • val2=$3 then stores the second line's value field in var. val2.

  • print $1 " " (op1 == "withop" ? val1 " " val2 : val2 " " val1) then prints a single output line for the two lines at hand:
    • $1, the 1st field, is by definition the same on both lines, so we can use the 2nd line's value.
    • (op1 == "withop" ? val1 " " val2 : val2 " " val1) is a C-style ternary operator (inline conditional) that simply either prints the 1st line's value before the 2nd line's or vice versa, depending on whether the first line's operation field was withop or not.

Upvotes: 1

nsilent22
nsilent22

Reputation: 2863

Assuming your data is always correct:

cat data.txt | sort | while read a b c && read d e f; do echo $a $c $f; done

Update: sorting added, as whithop and noop rows can be in any order.

Upvotes: 0

Related Questions