Reputation: 293
I have a file like this
I would like to extract the line with the 0 and 1 (all lines in the file) into a seperate file. However, the sequence does not have to start with a 0 but could also start with a 1. However, the line always comes directly after the line (SITE:). Moreover, I would like to extract the line SITTE itself into a seperate file. Could somebody tell me how that is doable in bash?
Upvotes: 1
Views: 1173
Reputation: 437062
Here's a simple awk
solution that matches all lines starting with SITE:
and outputs the respective next line:
awk '/^SITE:/ { if (getline) print }' infile > outfile
Simply omit the { ... }
block part to extract all lines starting with SITE:
themselves to a separate file:
awk '/^SITE:/' infile > outfile
If you wanted to combine both operations:
outfile1
and outfile2
are the names of the 2 output files, passed to awk
as variables f1
and f2
:
awk -v f1=outfile1 -v f2=outfile2 \
'/^SITE:/ { print > f1; if (getline) print > f2 }' infile
Upvotes: 1
Reputation: 3838
You could try something like :
$ egrep -o "^(0|1)+$" test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
$ grep "^SITE:" test.txt > test3.txt
$ cat test3.txt
SITE: 0 0.000340988542 0.0357651018
SITE: 1 0.000529755514 0.00324293642
SITE: 2 0.000577745511 0.052214098
Another solution, using bash :
$ while read; do [[ $REPLY =~ ^(0|1)+$ ]] && echo "$REPLY"; done < test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
To remove the characters 0
at beginning of the line :
$ egrep "^(0|1)+$" test.txt | sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
1010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000
11010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
UPDATE : New file format provided in comments :
$ egrep "^SITE:" test.txt|egrep -o "(0|1)+$"|sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
100000000000000000000001000001000000000000000000000000000000000000
1010010010000000000111101000010000001001010111111100000000000010010001101010100011101011110011100
10000000000
$ egrep "^SITE:" test.txt|sed "s/[01\ ]\{1,\}$//g" > test3.txt
$ cat test3.txt
SITE: 967 0.189021866 0.0169990123
SITE: 968 0.189149593 0.246619149
SITE: 969 0.189172266 6.84752689e-05
Upvotes: 1
Reputation: 545508
Moreover, I would like to extract the line SITTE itself into a seperate file.
That’s the easy part:
grep '^SITE:' infile > outfile.site
Extracting the line after that is slightly harder:
grep --after-context=1 '^SITE:' infile \
| grep '^[01]*$' \
> outfile.nr
--after-context
(or -A
) specifies how many lines after the matching line to print as well. We then use the second grep
to print only that line, and not the actually matching line (nor the delimiter which grep
puts between each matching entry when specifying an after-context
).
Alternatively, you could use the following to match the numeric lines:
grep '^[01]*$' infile > outfile.nr
That’s much easier, but it will find all lines consisting solely of 0s and 1s, regardless of whether they come after a line which starts with SITE:
.
Upvotes: 1