thunderousNinja
thunderousNinja

Reputation: 3520

How to get unique occurrences from a file?

I am having some problems trying to get unique occurrences of the DeviceId from a log file that has a similar format to the following:

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"123"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

What I am expecting is an output like this:

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

I tried using awk but I can seem to figure it out. Does anyone know how to do this?

I know there should be a way to just print the DeviceId using awk but I cant seem to figure it out. Once I do get the DeviceId I can just pipe to sort and uniq.

Upvotes: 1

Views: 176

Answers (6)

Scrutinizer
Scrutinizer

Reputation: 9926

Better to parse JSON (but another quick awk):

awk -F'.*DeviceId":"|["}]' '!A[$2]++' file 

Applying Ed Morton's suggestion for shaving off 3 more characters:

awk -F'.*DeviceId":"|"' '!A[$2]++' file 

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203349

With any awk:

$ awk '{id=$0;gsub(/.*DeviceId":"|".*/,"",id)} !seen[id]++' file
log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246799

With GNU awk:

gawk 'match($0, /DeviceId":"([^"]+)/, a) && seen[a[1]]++ == 0' log

Given your input, this outputs

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

(Note, this is essentially the gawk translation of @Perleone's answer although I did not notice at the time)

Upvotes: 4

Chris Seymour
Chris Seymour

Reputation: 85785

Unique device ID's using awk:

$ awk '/DeviceId/&&!a[$1]++&&gsub(/[^[:digit:]]/,"")' RS='[{,}]' file
123
234
323

The nice thing with awk is associative arrays, no need to pipe to sort -u.

Upvotes: 1

Rubens
Rubens

Reputation: 14768

Based on @cnicutar's answer, use sed, sort and cut:

sed 's/.*\"DeviceId":"\([0-9]*\).*/\1\t\0/' <file> | sort -u -k 1,1 | cut -f 2

Output:

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

Upvotes: 1

Perleone
Perleone

Reputation: 4038

Use Perl:

perl -lne 'if ( m{"DeviceId":" ([^"]+) "}xms ) { print if not $seen{$1}++; }' <log

Upvotes: 4

Related Questions