Reputation: 55
I'm trying to print some specific information from a file with a specific format (The file is as following : id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
)
I want to print out just the firstName sorted out and unique.
I specifically want to use these arguments when calling the script(let's call it script.sh) :
./script.sh --firstnames -f <file>
My code so far is the following :
--firstnames )
OlIFS=$IFS
content=$(cat "$3" | grep -v "#")
content=$(cat "$3" | tr -d " ") #cut -d " " -f6 )
for i in $content
do
IFS="|"
first=( $i )
echo ${first[2]}
IFS=$OlIFS
done | sort | uniq
;;
esac
For example for the following file :
#id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
933|Perera|Mahinda|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.12|Firefox
1129|Lepland|Carmen|female|1984-02-18|2010-02-28T04:39:58:781+0000|81.25.252.111|Internet Explorer
is supposed to have the output :
Carmen
Mahinda
One problem I've noticed is that the script prints the comments too. The above will print :
Carmen
firstnames
Mahinda
even though I've used grep to get rid of the lines starting with "#". This is only part of the code (it's where I believe is the problem). It's supposed to recognize the "--firstnames". Since some of the fields from the file will have spaces in between, specifically in the last section(the browser section) , I wanted to remove just that section. This is for a school project, and according to the program that grades this section, it's all wrong. The script works as far as I can tell though(I tested it). I don't know what's wrong with this therefore I don't know what to correct. Please help !
Upvotes: 1
Views: 929
Reputation: 3089
awk would be best for your case
$ awk -F "|" 'FNR>1 && !a[$3]++{print $3}' file | sort
Carmen
Mahinda
-F "|"
: To set |
as field delimiter while reading fields in file
FNR>1
: To skip first header line
a[$3]++
: creates an associative array with keys as the string in 3rd field/column i.e in firstName and incrementing it's value by 1 each time the key is found. However the value of $3
is printed only when !a[$3]++
is true i.e when the key doesn't exist in the array or I should say the key is being read the first time.
Upvotes: 3
Reputation: 24802
grep -vE '^#' "$3" | cut -d'|' -f3
should be enough :
$ echo '#id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
> 933|Perera|Mahinda|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.12|Firefox
> 1129|Lepland|Carmen|female|1984-02-18|2010-02-28T04:39:58:781+0000|81.25.252.111|Internet Explorer
>' | grep -vE '^#' | cut -d'|' -f3
Mahinda
Carmen
the grep
command removes lines starting with #
(it uses regular expressions to do so hence the -E
flag ; if you want to keep removing any line containing a #
, your current grep -v #
is correct), the cut -d'|' -f3
command splits the string around a |
delimiter and returns its third field.
Upvotes: 1