Reputation: 3344
I have comma separated file, I would like to split column 15 to $NF (15th column to last column) with same split condition split($column,a,"-")
and print for every splited column a[1]
. I cannot do loop over column from nth to last and print for every of them.
awk -F',' -v OFS="\t" '{for(i;$15<i<$NF,i+1);split($i,a,"_"); print ???}' file.csv
example of file printed form 15th column:
NBPF1-chr1-16579269-16579502-MedEx,NBPF1-chr1-16580779-16580863-MedEx,NBPF1-chr1-16581333-16581592-MedEx,NBPF1-chr1-16582457-16582758-MedEx,NBPF1-chr1-16583499-16583796-MedEx
What I expect:
NBPF1,NBPF1,NBPF1,NBPF1,NBPF1,NBPF1
Thank you.
Upvotes: 3
Views: 961
Reputation: 5252
How about just replace?
If you only want the first part after split, then no need to split and save to a temp variable:
awk -F, -v OFS="\t" '{for(i=15;i<=NF;i++)printf "%s" OFS, gensub(/-.*/,"",1,$i);print ""}' file.csv
If will create an empty column at the rightmost place, if you don't want that, then use this:
awk -F, -v OFS="\t" '{for(i=15;i<NF;i++)printf "%s" OFS, gensub(/-.*/,"","g",$i);print gensub(/-.*/,"","g",$NF)}' file.csv
replace "\t"
to ,
if you want the output to be comma separated.
It works for GNU awk, needs gensub
implemented.
Upvotes: 1
Reputation: 1126
With awk
you can get it:
awk -v RS='[-,]' 'NR%5==1' file
NBPF1
NBPF1
NBPF1
NBPF1
NBPF1
Or exactly what you expect:
awk -v RS='[-,]' 'NR%5==1{printf "%s%s", sep, $0; sep=","} END{print ""}' file
NBPF1,NBPF1,NBPF1,NBPF1,NBPF1
Upvotes: 1
Reputation: 203229
$ awk '{gsub(/-[^,]*/,"")}1' file
NBPF1,NBPF1,NBPF1,NBPF1,NBPF1
$ sed 's/-[^,]*//g' file
NBPF1,NBPF1,NBPF1,NBPF1,NBPF1
If that's not really all you need then please edit your question to provide more truly representative sample input/output.
Upvotes: 3
Reputation: 5252
Another example, to purely use regex replace:
awk '{gsub(/^([^,]*,){14}/,"")}gsub(/-[^,]*(,|$)/,"\t")' file.csv
This one just deleted first 14 columns from $0, then delete -
and things after it in each remaining column.
Tested with gnu awk.
example input:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,NBPF1-chr1-16579269-16579502-MedEx,NBPF1-chr1-16580779-16580863-MedEx,NBPF1-chr1-16581333-16581592-MedEx,NBPF1-chr1-16582457-16582758-MedEx,NBPF1-chr1-16583499-16583796-MedEx
1,2,3,4,5,6,7,8,9,10,11,12,13,14,NBPF1-chr1-16579269-16579502-MedEx,NBPF1-chr1-16580779-16580863-MedEx,NBPF1-chr1-16581333-16581592-MedEx,NBPF1-chr1-16582457-16582758-MedEx,NBPF1-chr1-16583499-16583796-MedEx
1,2,3,4,5,6,7,8,9,10,11,12,13,14,NBPF0-chr1-16579269-16579502-MedEx,NBPF1-chr1-16580779-16580863-MedEx,NBPF1-chr1-16581333-16581592-MedEx,NBPF1-chr1-16582457-16582758-MedEx,NBPF9-chr1-16583499-16583796-MedEx
output:
NBPF1 NBPF1 NBPF1 NBPF1 NBPF1
NBPF1 NBPF1 NBPF1 NBPF1 NBPF1
NBPF0 NBPF1 NBPF1 NBPF1 NBPF9
Upvotes: 1
Reputation: 133458
With your shown samples, please try following awk
code. Change i=15
or whatever field from where you want to start loop till last field of current line in following awk
code.
awk '
BEGIN{
FS=OFS=","
}
{
value=""
for(i=1;i<=NF;i++){
split($i, a, /-/)
value=(value?value OFS:"")a[1]
}
print value
}
' Input_file
Upvotes: 4
Reputation: 784998
You may use this awk
:
awk 'BEGIN {FS=OFS=","} {for(i=1; i<=NF; ++i) {
split($i, a, /-/); printf "%s%s", a[1], (i<NF ? OFS : ORS)}}' file
NBPF1,NBPF1,NBPF1,NBPF1,NBPF1
Change i=1
to i=15
or whatever field position you want to start extracting -
delimited values from.
Upvotes: 3