jhourback
jhourback

Reputation: 4571

Always include first line in grep

I often grep CSV files with column names on the first line. Therefore, I want the output of grep to always include the first line (to get the column names) as well as any lines matching the grep pattern. What is the best way to do this?

Upvotes: 84

Views: 34892

Answers (12)

user1488660
user1488660

Reputation: 65

Use a set of Aliases (bash aliases in this example)

alias grep+="(read line; echo \"\$line\" >&2; cat) | grep "

And for more than one line

alias grep++="(read line; echo \"\$line\" >&2; cat) | grep+ "
alias grep+++="(read line; echo \"\$line\" >&2; cat) | grep++ "
alias grep++++="(read line; echo \"\$line\" >&2; cat) | grep+++ "
#continue as needed but I really only need upto four for most things

Other answers here use non-grep commands like sed or awk which are not grep, an thus have different options. OR they use Shell Scripts or functions etc to use actual grep. Scripts will work for simple things but anything 'interesting' will require escaping of the arguments. The end result is not really grep-like unless very simple.

This is just a set of aliases that run grep as is and works as expected. The header (first) line(s) are pushed to StdErr while grep runs on StdOut. Standard grep, with no modification, then runs on the remainder.

#grep iscdhcp leases for 'iPhone'
#  leases list has 3 header lines
dhcp-lease-list  |grep+++ iPhone
Reading leases from /var/lib/dhcp/dhcpd.leases
MAC                IP              hostname       valid until         manufacturer
===============================================================================================
02:63:28:b2:aa:bb  100.64.28.93    iPhone         2025-02-01 00:04:03 -NA-
0a:42:ec:3a:aa:bb  172.16.65.83    iPhone         2025-02-01 00:55:05 -NA-
1a:0f:1d:b0:aa:bb  172.16.65.32    iPhone         2025-02-01 00:40:01 -NA-
1a:9d:4d:09:aa:bb  100.64.28.18    iPhone         2025-02-01 00:42:32 -NA-

For manual (visual) use this works fine. Its easy to use and remember.

The one caveat is; that if you sending the grep(+) output to something-else you need to merge StdErr and StrOut into just StrOut by adding 2>&1 after a grouped grep+ command

#this will show the first three lines, and only send to cat the grep'ed lines
# The pipe-to-file is just so it easier to see
dhcp-lease-list  |grep+++ iPhone |cat > /tmp/leases.txt
# to remerge the streams use, 2>&1 after the command contained ( )
#  this will send the full output to cat
dhcp-lease-list  |(grep+++ iPhone) 2>&1 |cat > /tmp/leases.txt

So its not perfect, but its much closer than the other (current) given answers

Upvotes: 0

levidos
levidos

Reputation: 97

We can use sed -u in a subshell to "pinch off" the first N lines to print it;

The rest of the content will be then processed by the next command.

What's nice here is that content is not duplicated, it's either received by sed or by the next command.

(sed -n 1q; command2)

Using file as input:

# Print the first line unchanged
(sed -u 1q; grep sys) < /etc/passwd

# Filter out the comments and empty lines, but keep the shebang
(sed -u 1q; grep -vE '^\s*#|^$') < /etc/init.d/urandom

Using pipe as input:

# Keep the header of ps
ps aux | (sed -u 1q; grep bash)

# Don't sort the header of netstat
netstat -lnt | (sed -u 2q; sort -r)

Upvotes: 0

zainengineer
zainengineer

Reputation: 13889

For files

head -n 1 file.csv ; grep MyValue file.csv

For commands

ps -aux | (head -n 1 ; grep index) | grep -v grep

For watch

watch "ps -aux | (head -n 1 ; grep index) | grep -v grep"

Upvotes: 0

kev
kev

Reputation: 161674

sed:

sed '1p;/pattern/!d' input.txt

awk:

awk 'NR==1 || /pattern/' input.txt

grep1:

grep1() { awk -v pattern="${1:?pattern is empty}" 'NR==1 || $0~pattern' "${2:-/dev/stdin}"; }

Upvotes: 90

Sven
Sven

Reputation: 534

All answer were correct. Just another idea for situations to grep the output of a command (and not a file) including the first line could be done like this ;-)

df -h | grep -E '(^Filesystem|/mnt)'  # <<< returns usage of devices, with mountpoint '/mnt/...'
ps aux | grep -E '(^USER|grep)'       # <<< returns all grep-process

The -E option of grep enables its regex-mode. The string we grep uses | and can be interpretated as an "or", so we look in the df-exmaple for lines:

  • starting with Filesystem (leading '^' in the first sub expression means "line starts with")
  • and lines, that contains /mnt

Another, way could be to pipe the output into a tempfile and to grep the content like shown in other posts. This can be helpful, if you don't know the content of the first line.

head -1 <file> && grep ff <file>

Upvotes: 0

Alex North-Keys
Alex North-Keys

Reputation: 4363

So, I posted a completely different short answer above a while back.

However, for those pining for a command that looks like grep in terms of taking all the same options (although this script requires you to use the long options if an optarg is involved), and can cope with weird characters in filenames, etc, etc.. have fun pulling this apart.

Essentially it's a grep that always emits the first line. If you think a file with no matching lines should skip emitting that first (header) line, well, that's left as an exercise for the reader. I saved is as grep+1.

#!/bin/bash
# grep+1 [<option>...] [<regex>] [<file>...]
# Emits the first line of each input and ignores it otherwise.
# For grep options that have optargs, only the --forms will work here.

declare -a files options
regex_seen=false
regex=

double_dash_seen=false
for arg in "$@" ; do
    is_file_or_rx=true
    case "$arg" in
        -*) is_file_or_rx=$double_dash_seen ;;
    esac
    if $is_file_or_rx ; then
        if ! $regex_seen ; then
            regex="$arg"
            regex_seen=true
        else
            files[${#files[*]}]="$arg"     # append the value
        fi
    else
        options[${#options[*]}]="$arg"     # append the value       
    fi
done

# We could either open files all at once in the shell and pass the handles into
# one grep call, but that would limit how many we can process to the fd limit.
# So instead, here's the simpler approach with a series of grep calls

if $regex_seen ; then
    if [ ${#files[@]} -gt 0 ] ; then
        for file in "${files[@]}" ; do
            head -n 1 "$file"
            tail -n +2 "$file" | grep --label="$file" "${options[@]}" "$regex" 
        done
    else
        grep "${options[@]}"   # stdin
    fi
else
    grep "${options[@]}"   # probably --help
fi

#--eof

Upvotes: 1

Eyal Levin
Eyal Levin

Reputation: 18406

Another option:

$ cat data.csv | (read line; echo "$line"; grep SEARCH_TERM)

Example:

$ echo "title\nvalue1\nvalue2\nvalue3" | (read line; echo "$line"; grep value2)

Output:

title
value2

Upvotes: 15

Aaron McDaid
Aaron McDaid

Reputation: 27133

This is a very general solution, for example if you want to sort a file while keeping the first line in place. Basically, "pass the first line through as-is, then do whatever I want (awk/grep/sort/whatever) on the rest of the data."

Try this in a script, perhaps calling it keepfirstline (don't forget chmod +x keepfirstline and to put it in your PATH):

#!/bin/bash
IFS='' read -r JUST1LIINE
printf "%s\n" "$JUST1LIINE"
exec "$@"

It can be used as follows:

cat your.data.csv | keepfirstline grep SearchTerm > results.with.header.csv

or perhaps, if you want to filter with awk

cat your.data.csv | keepfirstline awk '$1 < 3' > results.with.header.csv

I often like to sort a file, but keeping the header in the first line

cat your.data.csv | keepfirstline sort

keepfirstline executes the command it's given (grep SearchTerm), but only after reading and printing the first line.

Upvotes: 3

scibuff
scibuff

Reputation: 13755

Just do

head -1 <filename> 

and then execute grep

Upvotes: 0

Alex North-Keys
Alex North-Keys

Reputation: 4363

grep doesn't really have a concept of line number, but awk does, so here's an example to output lines contain "Incoming" - and the first line, whatever it is:

awk 'NR == 1 || /Incoming/' foo.csv

You could make a script (a bit excessive, but). I made a file, grep+1, and put this in it:

#!/bin/sh
pattern="$1" ; shift
exec awk 'NR == 1 || /'"$pattern"'/' "$@"

Now one can:

./grep+1 Incoming

edit: removed the "{print;}", which is awk's default action.

Upvotes: 21

Adam Liss
Adam Liss

Reputation: 48290

You can use sed instead of grep to do this:

sed -n -e '1p' -e '/pattern/p' < $FILE

This will print the first line twice, however, if it happens to contain the pattern.

-n tells sed not to print each line by default.
-e '1p' prints the first line.
-e '/pattern/p' prints each line that matches the pattern.

Upvotes: 10

DigitalRoss
DigitalRoss

Reputation: 146043

You could include an alternate pattern match for the one of the column names. If a column was called COL then this would work:

$ grep -E 'COL|pattern' file.csv

Upvotes: 30

Related Questions