Justin Buchanan
Justin Buchanan

Reputation: 117

How to repeat a sequence of numbers to the end of a column?

I have a data file that needs a new column of identifiers from 1 to 5. The final purpose is to split the data into five separate files with no leftover file (split leaves a leftover file).

Data:

aa
bb
cc
dd
ff
nn
ww
tt
pp

with identifier column:

aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4

Not sure if this can be done with seq? Afterwards it will be split with:

awk '$2 == 1 {print $0}' 
awk '$2 == 2 {print $0}' 
awk '$2 == 3 {print $0}' 
awk '$2 == 4 {print $0}' 
awk '$2 == 5 {print $0}' 

Upvotes: 0

Views: 123

Answers (3)

Ed Morton
Ed Morton

Reputation: 203684

$ awk '{print $0, ((NR-1)%5)+1}' file
aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4

No need for that to create 5 separate files of course. All you need is:

awk '{print > ("file_" ((NR-1)%5)+1)}' file

Looks like you're happy with a perl solution that outputs 1-4 then 0 instead of 1-5 so FYI here's the equivalent in awk:

$ awk '{print $0, NR%5}' file        
aa 1
bb 2
cc 3
dd 4
ff 0
nn 1
ww 2
tt 3
pp 4

Upvotes: 1

Hunter McMillen
Hunter McMillen

Reputation: 61510

I am going to offer a Perl solution even though it wasn't tagged because Perl is well suited to solve this problem.

If I understand what you want to do, you have a single file that you want to split into 5 separate files based on the position of a line in the data file:

the first line in the data file goes to file 1
the second line in the data file goes to file 2 
the third line in the data file goes to file 3 
...

since you already have the lines position in the file, you don't really need the identifier column (though you could pursue that solution if you wanted).

Instead you can open 5 filehandles and simply alternate which handle you write to:

use strict;
use warnings; 

my $datafilename = shift @ARGV; 

# open filehandles and store them in an array 
my @fhs;
foreach my $i ( 0 .. 4 ) {
   open my $fh, '>', "${datafilename}_$i"
      or die "$!";
   $fhs[$i] = $fh;
}

# open the datafile 
open my $datafile_fh, '<', $datafilename 
   or die "$!";

my $row_number = 0;
while ( my $datarow = <$datafile_fh> ) {
   print { $fhs[$row_number++ % @fhs] } $datarow;
}

# close resources
foreach my $fh ( @fhs ) {
   close $fh; 
}

Upvotes: 1

choroba
choroba

Reputation: 241928

Perl to the rescue:

perl -pe 's/$/" " . $. % 5/e' < input > output

Uses 0 instead of 5.

  • $. is the line number.
  • % is the modulo operator.
  • the /e modifier tells the substitution to evaluate the replacement part as code

i.e. end of line ($) is replaced with a space concatenated (.) with the line number modulo 5.

Upvotes: 3

Related Questions