Vonton
Vonton

Reputation: 3344

awk split more columns and print first word

I have comma separated file, I would like to split column 15 to $NF (15th column to last column) with same split condition split($column,a,"-") and print for every splited column a[1]. I cannot do loop over column from nth to last and print for every of them.

awk -F',' -v OFS="\t" '{for(i;$15<i<$NF,i+1);split($i,a,"_"); print ???}' file.csv

example of file printed form 15th column:

NBPF1-chr1-16579269-16579502-MedEx,NBPF1-chr1-16580779-16580863-MedEx,NBPF1-chr1-16581333-16581592-MedEx,NBPF1-chr1-16582457-16582758-MedEx,NBPF1-chr1-16583499-16583796-MedEx

What I expect:

NBPF1,NBPF1,NBPF1,NBPF1,NBPF1,NBPF1

Thank you.

Upvotes: 3

Views: 961

Answers (7)

Tyl
Tyl

Reputation: 5252

How about just replace?
If you only want the first part after split, then no need to split and save to a temp variable:

awk -F, -v OFS="\t" '{for(i=15;i<=NF;i++)printf "%s" OFS, gensub(/-.*/,"",1,$i);print ""}' file.csv

If will create an empty column at the rightmost place, if you don't want that, then use this:

awk -F, -v OFS="\t" '{for(i=15;i<NF;i++)printf "%s" OFS, gensub(/-.*/,"","g",$i);print gensub(/-.*/,"","g",$NF)}' file.csv

replace "\t" to , if you want the output to be comma separated.
It works for GNU awk, needs gensub implemented.

Upvotes: 1

Carlos Pascual
Carlos Pascual

Reputation: 1126

With awk you can get it:

awk -v RS='[-,]' 'NR%5==1' file       
NBPF1                                   
NBPF1
NBPF1
NBPF1
NBPF1

Or exactly what you expect:

awk -v RS='[-,]' 'NR%5==1{printf "%s%s", sep, $0; sep=","} END{print ""}' file
NBPF1,NBPF1,NBPF1,NBPF1,NBPF1

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203229

$ awk '{gsub(/-[^,]*/,"")}1' file
NBPF1,NBPF1,NBPF1,NBPF1,NBPF1

$ sed 's/-[^,]*//g' file
NBPF1,NBPF1,NBPF1,NBPF1,NBPF1

If that's not really all you need then please edit your question to provide more truly representative sample input/output.

Upvotes: 3

user14473238
user14473238

Reputation:

cut -d, -f15- file | sed 's/-[^,]*//g'

Upvotes: 3

Tyl
Tyl

Reputation: 5252

Another example, to purely use regex replace:

awk '{gsub(/^([^,]*,){14}/,"")}gsub(/-[^,]*(,|$)/,"\t")' file.csv

This one just deleted first 14 columns from $0, then delete -and things after it in each remaining column.
Tested with gnu awk.

example input:

1,2,3,4,5,6,7,8,9,10,11,12,13,14,NBPF1-chr1-16579269-16579502-MedEx,NBPF1-chr1-16580779-16580863-MedEx,NBPF1-chr1-16581333-16581592-MedEx,NBPF1-chr1-16582457-16582758-MedEx,NBPF1-chr1-16583499-16583796-MedEx
1,2,3,4,5,6,7,8,9,10,11,12,13,14,NBPF1-chr1-16579269-16579502-MedEx,NBPF1-chr1-16580779-16580863-MedEx,NBPF1-chr1-16581333-16581592-MedEx,NBPF1-chr1-16582457-16582758-MedEx,NBPF1-chr1-16583499-16583796-MedEx
1,2,3,4,5,6,7,8,9,10,11,12,13,14,NBPF0-chr1-16579269-16579502-MedEx,NBPF1-chr1-16580779-16580863-MedEx,NBPF1-chr1-16581333-16581592-MedEx,NBPF1-chr1-16582457-16582758-MedEx,NBPF9-chr1-16583499-16583796-MedEx

output:

NBPF1   NBPF1   NBPF1   NBPF1   NBPF1
NBPF1   NBPF1   NBPF1   NBPF1   NBPF1
NBPF0   NBPF1   NBPF1   NBPF1   NBPF9

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133458

With your shown samples, please try following awk code. Change i=15 or whatever field from where you want to start loop till last field of current line in following awk code.

awk '
BEGIN{
  FS=OFS=","
}
{
  value=""
  for(i=1;i<=NF;i++){
    split($i, a, /-/)
    value=(value?value OFS:"")a[1]
  }
  print value
}
'  Input_file

Upvotes: 4

anubhava
anubhava

Reputation: 784998

You may use this awk:

awk 'BEGIN {FS=OFS=","} {for(i=1; i<=NF; ++i) {
split($i, a, /-/); printf "%s%s", a[1], (i<NF ? OFS : ORS)}}' file

NBPF1,NBPF1,NBPF1,NBPF1,NBPF1

Change i=1 to i=15 or whatever field position you want to start extracting - delimited values from.

Upvotes: 3

Related Questions