Sammie
Sammie

Reputation: 161

Counting Number of Appearances with Multiple Conditions

I'm trying to count the number of times an email appears in my data if it appears in Week 6. I'm currently using this code

if week=6 has6=1.
aggregate out=* mode=addvariables overwrite=yes/break=email /has6=max(has6).
aggregate out=* mode=addvariables /break=email /n=sum(has6).

This is working as expected but I left out one condition that I need. Email addresses can show up multiple times per week, but I only want to count once per week.

For example:

  Email            Week         N
[email protected]         6          2
[email protected]       3          
[email protected]       4         
[email protected]    6          1
[email protected]         4          2
[email protected]         4          2

So even though [email protected] appears in the data three times, I only want her to show as N=2 because she appears only two times in different weeks (6 and 4). I do not want it to count her second instance in week 4 (which is what the current code is doing: She is being counted as a 3 because she appears three times).

So my two conditions are:

  1. Only count email addresses that appear in Week 6 at least once.
  2. Only count once per week.

TIA!

Upvotes: 2

Views: 69

Answers (1)

eli-k
eli-k

Reputation: 11360

This will recreate the example data you gave:

data list list/email(a50) week(f1).
begin data
"[email protected]" 6
"[email protected]" 3
"[email protected]" 4
"[email protected]" 6
"[email protected]" 4
"[email protected]" 4
end data.

First we'll identify each email that had week 6 at least once, and mark all it's occurences :

if week=6 has6=1.
aggregate out=* mode=addvariables overwrite=yes/break=email /has6=max(has6).

Now there are two ways to proceed.

First method: Second aggregate and reattach results to data:.

sort cases by email week.
dataset name orig.
dataset declare agg.
aggregate out=agg /break=email week/has6=max(has6).
dataset activate agg.
select if has6.
aggregate out=* mode=addvariables/break email/n=n.
dataset activate orig.
match files /file=* /table=agg /by email week.
exe.

Second method: Comparing rows to discount non-unique cases:

sort cases by email week.
compute countThis=has6.
if $casenum>1 and has6 and lag(email)=email and lag(week)=week countThis=0.
exe.
aggregate out=* mode=addvariables /break=email /n=sum(countThis).

Upvotes: 1

Related Questions