Reputation: 123

grep: Keeping lines that has specific string in certain column

I am trying to pick out the lines that have certain value in certain column and save it to an output. I am trying to do this with grep. Is it possible?

My data is looks like this:

apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf
melon   1   ewtedf   wersdf
orange  3   qqqwetr  hredfg

I want to pick out lines that have value 5 on its 2nd column and save it to new outputfile.

apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf

I would appreciate for help!

Upvotes: 5

Answers (5)

Michaël Le Barbier

Reputation: 6478

It is probably possible with grep but the adequate tool to perform this operation is definitely awk. You can filter every line having 5 on the second column with

awk '$2 == 5'

Explanation

awk splits it inputs in records (usually a line) and fields (usually a column) and perform actions on records matching certain conditions. Here

awk '$2 == 5'

is a short form for

awk '$2 == 5 {print($0)}'

which translates to

For each record, if the second field ($2) is 5, print the full record ($0).

Variations

If you need to choose dynamically the key value used to filter your values, use the -v option of awk:

awk -v "key=5" '$2 == key {print($0)}'

If you need to keep the first line of the file because it contains a header to the table, use the NR variable that keeps track of the ordinal number of the current record:

awk 'NR == 1 || $2 == 5'

The field separator is a regular expression defining which text separates columns, it can be modified with the -F field. For instance, if your data were in a basic CSV file, the filter would be

awk -F", *" '$2 == 5'

Visit the awk tag wiki to find a few useful information to get started learning awk.

Upvotes: 7

AkihikoTakahashi

Reputation: 159

You can get following command.

$ cat data.txt
apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf
melon   1   ewtedf   wersdf
orange  3   qqqwetr  hredfg
grape   55  kkkkkkk  aaaaaa

$ grep -E '[^ ]+ +5 .*' data.txt > output.txt

$ cat output.txt
apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf

You can get the answer only with grep command. But I strongly recommend you use awk command.

Upvotes: 0

Etan Reisner

Reputation: 81052

To print when the second field is 5 use: awk '$2==5' file

Upvotes: 4

David C. Rankin

Reputation: 84642

The simple way to do it is:

grep '5' MyDataFile

The result:

apple   5   abcdefd  ewdsf
peach   5   ewtdsfe  wtesdf

To capture that in a new file:

grep '5' MyDataFile > newfile

Note: that will find a 5 anywhere in MyDataFile. To restrict to the second column, a short script is what would suit your needs. If you want to limit it to the second column only, then a quick script like the following will do. Usage: script number datafile:

#!/bin/bash

while read -r fruit num stuff || [ -n "$stuff" ]; do
    [ "$num" -eq "$1" ] && printf "%s  %s  %s\n" "$fruit" "$num" "$stuff"
done <"$2"

output:

$ ./fruit.sh 5 dat/mydata.dat

apple  5  abcdefd  ewdsf
peach  5  ewtdsfe  wtesdf

Upvotes: -2

Fordio

Reputation: 3830

Give this a try:

grep '^[^\s]\+\s5.*$' file.txt

the pattern looks for start of line, followed by more than one non-space character, followed by space, followed by 5, follwed by any number of chars, followed by eol.

Upvotes: 0

grep: Keeping lines that has specific string in certain column

Answers (5)

Explanation

Variations

Related Questions