Reputation: 4571
I often grep CSV files with column names on the first line. Therefore, I want the output of grep to always include the first line (to get the column names) as well as any lines matching the grep pattern. What is the best way to do this?
Upvotes: 84
Views: 34892
Reputation: 65
Use a set of Aliases (bash aliases in this example)
alias grep+="(read line; echo \"\$line\" >&2; cat) | grep "
And for more than one line
alias grep++="(read line; echo \"\$line\" >&2; cat) | grep+ "
alias grep+++="(read line; echo \"\$line\" >&2; cat) | grep++ "
alias grep++++="(read line; echo \"\$line\" >&2; cat) | grep+++ "
#continue as needed but I really only need upto four for most things
Other answers here use non-grep commands like sed or awk which are not grep, an thus have different options. OR they use Shell Scripts or functions etc to use actual grep. Scripts will work for simple things but anything 'interesting' will require escaping of the arguments. The end result is not really grep-like unless very simple.
This is just a set of aliases that run grep as is and works as expected. The header (first) line(s) are pushed to StdErr while grep runs on StdOut. Standard grep, with no modification, then runs on the remainder.
#grep iscdhcp leases for 'iPhone'
# leases list has 3 header lines
dhcp-lease-list |grep+++ iPhone
Reading leases from /var/lib/dhcp/dhcpd.leases
MAC IP hostname valid until manufacturer
===============================================================================================
02:63:28:b2:aa:bb 100.64.28.93 iPhone 2025-02-01 00:04:03 -NA-
0a:42:ec:3a:aa:bb 172.16.65.83 iPhone 2025-02-01 00:55:05 -NA-
1a:0f:1d:b0:aa:bb 172.16.65.32 iPhone 2025-02-01 00:40:01 -NA-
1a:9d:4d:09:aa:bb 100.64.28.18 iPhone 2025-02-01 00:42:32 -NA-
For manual (visual) use this works fine. Its easy to use and remember.
The one caveat is; that if you sending the grep(+) output to something-else you need to merge StdErr and StrOut into just StrOut by adding 2>&1 after a grouped grep+ command
#this will show the first three lines, and only send to cat the grep'ed lines
# The pipe-to-file is just so it easier to see
dhcp-lease-list |grep+++ iPhone |cat > /tmp/leases.txt
# to remerge the streams use, 2>&1 after the command contained ( )
# this will send the full output to cat
dhcp-lease-list |(grep+++ iPhone) 2>&1 |cat > /tmp/leases.txt
So its not perfect, but its much closer than the other (current) given answers
Upvotes: 0
Reputation: 97
We can use sed -u
in a subshell to "pinch off" the first N lines to print it;
The rest of the content will be then processed by the next command.
What's nice here is that content is not duplicated, it's either received by sed or by the next command.
(sed -n 1q; command2)
Using file as input:
# Print the first line unchanged
(sed -u 1q; grep sys) < /etc/passwd
# Filter out the comments and empty lines, but keep the shebang
(sed -u 1q; grep -vE '^\s*#|^$') < /etc/init.d/urandom
Using pipe as input:
# Keep the header of ps
ps aux | (sed -u 1q; grep bash)
# Don't sort the header of netstat
netstat -lnt | (sed -u 2q; sort -r)
Upvotes: 0
Reputation: 13889
For files
head -n 1 file.csv ; grep MyValue file.csv
For commands
ps -aux | (head -n 1 ; grep index) | grep -v grep
For watch
watch "ps -aux | (head -n 1 ; grep index) | grep -v grep"
Upvotes: 0
Reputation: 161674
sed '1p;/pattern/!d' input.txt
awk 'NR==1 || /pattern/' input.txt
grep1() { awk -v pattern="${1:?pattern is empty}" 'NR==1 || $0~pattern' "${2:-/dev/stdin}"; }
Upvotes: 90
Reputation: 534
All answer were correct. Just another idea for situations to grep the output of a command (and not a file) including the first line could be done like this ;-)
df -h | grep -E '(^Filesystem|/mnt)' # <<< returns usage of devices, with mountpoint '/mnt/...'
ps aux | grep -E '(^USER|grep)' # <<< returns all grep-process
The -E
option of grep enables its regex-mode. The string we grep uses |
and can be interpretated as an "or", so we look in the df
-exmaple for lines:
Filesystem
(leading '^' in the first sub expression means "line starts with")/mnt
Another, way could be to pipe the output into a tempfile
and to grep the content like shown in other posts. This can be helpful, if you don't know the content of the first line.
head -1 <file> && grep ff <file>
Upvotes: 0
Reputation: 4363
So, I posted a completely different short answer above a while back.
However, for those pining for a command that looks like grep in terms of taking all the same options (although this script requires you to use the long options if an optarg is involved), and can cope with weird characters in filenames, etc, etc.. have fun pulling this apart.
Essentially it's a grep that always emits the first line. If you think a file with no matching lines should skip emitting that first (header) line, well, that's left as an exercise for the reader. I saved is as grep+1
.
#!/bin/bash
# grep+1 [<option>...] [<regex>] [<file>...]
# Emits the first line of each input and ignores it otherwise.
# For grep options that have optargs, only the --forms will work here.
declare -a files options
regex_seen=false
regex=
double_dash_seen=false
for arg in "$@" ; do
is_file_or_rx=true
case "$arg" in
-*) is_file_or_rx=$double_dash_seen ;;
esac
if $is_file_or_rx ; then
if ! $regex_seen ; then
regex="$arg"
regex_seen=true
else
files[${#files[*]}]="$arg" # append the value
fi
else
options[${#options[*]}]="$arg" # append the value
fi
done
# We could either open files all at once in the shell and pass the handles into
# one grep call, but that would limit how many we can process to the fd limit.
# So instead, here's the simpler approach with a series of grep calls
if $regex_seen ; then
if [ ${#files[@]} -gt 0 ] ; then
for file in "${files[@]}" ; do
head -n 1 "$file"
tail -n +2 "$file" | grep --label="$file" "${options[@]}" "$regex"
done
else
grep "${options[@]}" # stdin
fi
else
grep "${options[@]}" # probably --help
fi
#--eof
Upvotes: 1
Reputation: 18406
Another option:
$ cat data.csv | (read line; echo "$line"; grep SEARCH_TERM)
Example:
$ echo "title\nvalue1\nvalue2\nvalue3" | (read line; echo "$line"; grep value2)
Output:
title
value2
Upvotes: 15
Reputation: 27133
This is a very general solution, for example if you want to sort a file while keeping the first line in place. Basically, "pass the first line through as-is, then do whatever I want (awk
/grep
/sort
/whatever) on the rest of the data."
Try this in a script, perhaps calling it keepfirstline
(don't forget chmod +x keepfirstline
and to put it in your PATH
):
#!/bin/bash
IFS='' read -r JUST1LIINE
printf "%s\n" "$JUST1LIINE"
exec "$@"
It can be used as follows:
cat your.data.csv | keepfirstline grep SearchTerm > results.with.header.csv
or perhaps, if you want to filter with awk
cat your.data.csv | keepfirstline awk '$1 < 3' > results.with.header.csv
I often like to sort a file, but keeping the header in the first line
cat your.data.csv | keepfirstline sort
keepfirstline
executes the command it's given (grep SearchTerm
), but only after reading and printing the first line.
Upvotes: 3
Reputation: 4363
grep doesn't really have a concept of line number, but awk does, so here's an example to output lines contain "Incoming" - and the first line, whatever it is:
awk 'NR == 1 || /Incoming/' foo.csv
You could make a script (a bit excessive, but). I made a file, grep+1, and put this in it:
#!/bin/sh
pattern="$1" ; shift
exec awk 'NR == 1 || /'"$pattern"'/' "$@"
Now one can:
./grep+1 Incoming
edit: removed the "{print;}", which is awk's default action.
Upvotes: 21
Reputation: 48290
You can use sed
instead of grep
to do this:
sed -n -e '1p' -e '/pattern/p' < $FILE
This will print the first line twice, however, if it happens to contain the pattern.
-n
tells sed
not to print each line by default.
-e '1p'
prints the first line.
-e '/pattern/p'
prints each line that matches the pattern.
Upvotes: 10
Reputation: 146043
You could include an alternate pattern match for the one of the column names. If a column was called COL then this would work:
$ grep -E 'COL|pattern' file.csv
Upvotes: 30