Reputation: 3520
I am having some problems trying to get unique occurrences of the DeviceId
from a log file that has a similar format to the following:
log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"123"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
What I am expecting is an output like this:
log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
I tried using awk
but I can seem to figure it out. Does anyone know how to do this?
I know there should be a way to just print the DeviceId
using awk
but I cant seem to figure it out. Once I do get the DeviceId
I can just pipe to sort
and uniq
.
Upvotes: 1
Views: 176
Reputation: 9926
Better to parse JSON (but another quick awk):
awk -F'.*DeviceId":"|["}]' '!A[$2]++' file
Applying Ed Morton's suggestion for shaving off 3 more characters:
awk -F'.*DeviceId":"|"' '!A[$2]++' file
Upvotes: 1
Reputation: 203349
With any awk:
$ awk '{id=$0;gsub(/.*DeviceId":"|".*/,"",id)} !seen[id]++' file
log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
Upvotes: 1
Reputation: 246799
With GNU awk:
gawk 'match($0, /DeviceId":"([^"]+)/, a) && seen[a[1]]++ == 0' log
Given your input, this outputs
log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
(Note, this is essentially the gawk translation of @Perleone's answer although I did not notice at the time)
Upvotes: 4
Reputation: 85785
Unique device ID's using awk
:
$ awk '/DeviceId/&&!a[$1]++&&gsub(/[^[:digit:]]/,"")' RS='[{,}]' file
123
234
323
The nice thing with awk
is associative arrays, no need to pipe to sort -u
.
Upvotes: 1
Reputation: 14768
Based on @cnicutar's answer, use sed
, sort
and cut
:
sed 's/.*\"DeviceId":"\([0-9]*\).*/\1\t\0/' <file> | sort -u -k 1,1 | cut -f 2
Output:
log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
Upvotes: 1
Reputation: 4038
Use Perl:
perl -lne 'if ( m{"DeviceId":" ([^"]+) "}xms ) { print if not $seen{$1}++; }' <log
Upvotes: 4