Whoppa
Whoppa

Reputation: 989

Checking for duplicate strings and discarding them?

I have two files I have created that by parsing other files and removing the relevant information. One of the files has lines that look like this:

Ahmed,Safdar:D433:181:20.40:30.00
Gonzales,Carlos:D433:7732:18.00:24.00
Thanhachammet,Chendrit:D500:5833:8.40:12.10
Bush,G:D500:8343:13.00:19.00

The other one:

343#2#8#011104
1958#2#9#011204
181##16#012404
773##4#012404

I want to check if the 3rd field of the colon separated lines matches the first field of the pound sign separated lines. If it does I want to generate a list of which lines matched. I'm kind of stuck on how to do this. This is what I tried: t

temp=$(mktemp)
dept=$(cut -d: -f3 "$tempDept")
pay=$(cut -d# -f1 "$tempPay")
if echo "$dept" | grep -w "$pay"; then
        cat "$dept" >> "$temp"
        cat "$pay" >> "$temp"
fi

Upvotes: 0

Views: 74

Answers (2)

user1907906
user1907906

Reputation:

Use join.

$ cat 1
Ahmed,Safdar:D433:181:20.40:30.00
Gonzales,Carlos:D433:7732:18.00:24.00
Thanhachammet,Chendrit:D500:5833:8.40:12.10
Bush,G:D500:8343:13.00:19.00

$ cat 2
343#2#8#011104
1958#2#9#011204
181##16#012404
773##4#012404

$ sort -t: -k3 1 > 1a

$ sed 's/#/:/g' 2 | sort -t: -k 1 > 2a

$ cat 1a
Ahmed,Safdar:D433:181:20.40:30.00
Thanhachammet,Chendrit:D500:5833:8.40:12.10
Gonzales,Carlos:D433:7732:18.00:24.00
Bush,G:D500:8343:13.00:19.00

$ cat 2a
181::16:012404
1958:2:9:011204
343:2:8:011104
773::4:012404

$ join -t: -1 3 -2 1 1a 2a
181:Ahmed,Safdar:D433:20.40:30.00::16:012404

Upvotes: 1

devnull
devnull

Reputation: 123458

Using awk, you could say:

awk -F'[:#]' 'FNR==NR {_[$1];next} $3 in _' pound_separated_file colon_separated_file

For your input, it'd produce:

Ahmed,Safdar:D433:181:20.40:30.00

Upvotes: 1

Related Questions