Sayan Roy
Sayan Roy

Reputation: 43

Search for the first occurrence of a keyword + print the next column in Linux

Suppose I have a file like this:

d,e,c,g,v,c,w,r
g,c,d,c,s,c,g,r
d,y,c,w,t,g,c,f

Now I want to print the column (without the comma delimiter) which appears just after the first 'c' in each row.So my output will look like this

g
d
w

I have tried the code:

awk -F"," '{for (i=1;i<=NF;i++) if ($i == "c") {print $(i+1)};}' filename

But in output I'm getting the columns which appears after each 'c'. I only want the column which appears after the first 'c' . How to solve the problem preferably using awk.

Thanks in advance

Upvotes: 3

Views: 364

Answers (8)

Sundeep
Sundeep

Reputation: 23667

With ripgrep

$ rg -No 'c,([^,]+).*' -r '$1' ip.txt
g
d
w

$ # if you only want to match whole column
$ rg -No '(^|,)c,([^,]+).*' -r '$2' ip.txt
g
d
w
  • -N to disable line number prefix in output
  • ([^,]+) to capture the column content
  • .* match everything after to avoid multiple matches in a line
  • -r '$1' replace matched portion with only content of capture group
  • (^|,) to ensure only whole column is matched

Upvotes: 1

Cyrus
Cyrus

Reputation: 88573

with GNU awk:

awk '{split($2,array,","); print array[2]}' FS="c" file

Output:

g
d
w

I used awk's field separator (FS) to split row with c in two parts ($1 and $2). With split I split second part ($2) then with , in multiple parts in an array (array) and printed second element.

Upvotes: 1

mtnezm
mtnezm

Reputation: 1027

Another option:

$ awk -F'c,' '{ print $2 }' < filename |cut -d, -f1
g
d
w

Upvotes: 1

alani
alani

Reputation: 13049

A perl one:

perl -pe 's/^.*?c,(.).*/\1/g,' filename

If it is not guaranteed that the input contains a c on every line, then this version will filter out any lines that do not:

perl -ne 'if (/c/) {s/^.*?c,(.).*/\1/g,; print}' filename

Upvotes: 0

L&#233;a Gris
L&#233;a Gris

Reputation: 19545

A sed solution:

sed -n 's/[^c]*c,\([^,]\).*/\1/p' filename

RegEx101 running this

Upvotes: 6

pastadisaster
pastadisaster

Reputation: 77

The sed / RegEx answer would be something like

sed 's/[^c]*,c,\([^,]*\),.*/\1/' filename > outfile

Should also work for multi-character entries.

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133428

Considering that per line only one c will come. Could you please try following. This doesn't require loop + it will look for small or capital letter c here in lines.

awk 'match($0,/[cC],[^,]*/){
  print substr($0,RSTART+2,RLENGTH-2)
}
' Input_file

Explanation: Using function named match here where I am mentioning regex to match from small/capital character then comma after it to till next occurrence of comma here. If this regex is having matched value then variables named RSTART and RLENGTH will be set. Where RSTART tells starting of regex and RLENGTH tells total length of matched regex. Taking these values printing sub-string here from current line.

Upvotes: 4

MichalH
MichalH

Reputation: 1074

Use awk keyword next to skip to next line after the first found "c" on each line:

$ awk -F"," '{for (i=1;i<=NF;i++) if ($i == "c") {print $(i+1);next};}' filename
g
d
w

Upvotes: 5

Related Questions