Reputation: 1108
I have a data file with more than 40000 column. In header each column's name begins with C1 , c2, ..., cn and each set of c has one or several subset for example c1. has 2 subsets. I need to delete first column(subset) of each set of c. for example if input looks like :
input:
c1.20022 c1.31012 c2.44444 c2.87634 c2.22233 c3.00444 c3.44444
1 1 0 1 0 0 0 1
2 0 1 0 0 1 0 1
3 0 1 0 0 1 1 0
4 1 0 1 0 0 1 0
5 1 0 1 0 0 1 0
6 1 0 1 0 0 1 0
I need the output be like:
c1.31012 c2.87634 c2.22233 c3.44444
1 0 0 0 1
2 1 0 1 1
3 1 0 1 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 1 0 0 0
Any suggestion please?
update: If there be no space between digits in row (which is th real situation of my data set) then what should I do? my mean is that my real data looks like this: input:
c1.20022 c1.31012 c2.44444 c2.87634 c2.22233 c3.00444 c3.44444
1 1010001
2 0100101
3 0100110
4 1010010
5 1010010
6 1010010
and output:
c1.31012 c2.87634 c2.22233 c3.44444
1 0001
2 1011
3 1010
4 0000
5 0000
6 0000
7 1000
Upvotes: 0
Views: 53
Reputation: 241968
Perl solution: It first reads the header line, uses a regex to extract the column name before a dot, and keeps a list of column numbers to keep. It then uses the indices to print only the wanted columns from the header and remaining lines.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my @header = split ' ', <>;
my $last = q();
my @keep;
for my $i (0 .. $#header) {
my ($prefix) = $header[$i] =~ /(.*)\./;
if ($prefix eq $last) {
push @keep, $i + 1;
}
$last = $prefix;
}
unshift @header, q();
say join "\t", @header[@keep];
while (<>) {
my @columns = split;
say join "\t", @columns[@keep];
}
Update:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my @header = split ' ', <>;
my $last = q();
my @keep;
for my $i (0 .. $#header) {
my ($prefix) = $header[$i] =~ /(.*)\./;
if ($prefix eq $last) {
push @keep, $i;
}
$last = $prefix;
}
say join "\t", @header[@keep];
while (<>) {
my ($line_number, $all_digits) = split;
my @digits = split //, $all_digits;
say join "\t", $line_number, join q(), @digits[@keep];
}
Upvotes: 2