Dimitris Delis
Dimitris Delis

Reputation: 55

Printing specific parts from a file in shell

I'm trying to print some specific information from a file with a specific format (The file is as following : id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed ) I want to print out just the firstName sorted out and unique. I specifically want to use these arguments when calling the script(let's call it script.sh) :

./script.sh --firstnames -f <file>

My code so far is the following :

--firstnames )
OlIFS=$IFS
content=$(cat "$3" | grep -v "#")
content=$(cat "$3" | tr -d " ") #cut -d " " -f6 )
for i in $content
do

IFS="|"
first=( $i ) 
echo ${first[2]}
IFS=$OlIFS
done | sort | uniq
;;
esac

For example for the following file :

#id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
933|Perera|Mahinda|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.12|Firefox
1129|Lepland|Carmen|female|1984-02-18|2010-02-28T04:39:58:781+0000|81.25.252.111|Internet Explorer

is supposed to have the output :

Carmen
Mahinda

One problem I've noticed is that the script prints the comments too. The above will print :

Carmen
firstnames
Mahinda

even though I've used grep to get rid of the lines starting with "#". This is only part of the code (it's where I believe is the problem). It's supposed to recognize the "--firstnames". Since some of the fields from the file will have spaces in between, specifically in the last section(the browser section) , I wanted to remove just that section. This is for a school project, and according to the program that grades this section, it's all wrong. The script works as far as I can tell though(I tested it). I don't know what's wrong with this therefore I don't know what to correct. Please help !

Upvotes: 1

Views: 929

Answers (2)

Rahul Verma
Rahul Verma

Reputation: 3089

awk would be best for your case

$ awk -F "|" 'FNR>1 && !a[$3]++{print $3}' file | sort
Carmen
Mahinda

-F "|" : To set | as field delimiter while reading fields in file
FNR>1 : To skip first header line
a[$3]++ : creates an associative array with keys as the string in 3rd field/column i.e in firstName and incrementing it's value by 1 each time the key is found. However the value of $3 is printed only when !a[$3]++ is true i.e when the key doesn't exist in the array or I should say the key is being read the first time.

Upvotes: 3

Aaron
Aaron

Reputation: 24802

grep -vE '^#' "$3" | cut -d'|' -f3 should be enough :

$ echo '#id|lastName|firstName|gender|birthday|creationDate|locationIP|browserUsed
> 933|Perera|Mahinda|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.12|Firefox
> 1129|Lepland|Carmen|female|1984-02-18|2010-02-28T04:39:58:781+0000|81.25.252.111|Internet Explorer
>' | grep -vE '^#' | cut -d'|' -f3
Mahinda
Carmen

the grep command removes lines starting with # (it uses regular expressions to do so hence the -E flag ; if you want to keep removing any line containing a #, your current grep -v # is correct), the cut -d'|' -f3 command splits the string around a | delimiter and returns its third field.

Upvotes: 1

Related Questions