Determining Total Frequency of a Numeric ID with Several ID and Frequency Columns

Question

I apologize for the confusing title, I've added an example to clarify. I believe this is actually pretty easy, but I just can't for the life of me determine how to do it. Essentially I have long lists of IDs and their corresponding frequencies, around 45 lists all very long. I want to determine the total frequency of each ID, but the issue is that each ID is not in each list. I tried importing the data into access, but the operation was too complicated and resulted in errors in access.

I would be able to do this in SPSS if there were a discrete number of IDs for If statements, but in total there are around 10,000 IDs necessitating a way to do this quickly.

The data looks like this, with 50 ID Code Columns with an associated frequency column.

ID CODE  Frequency  ID CODE     Frequency 
0001         3        0002          3
0002         4        0003          4
0003         2

Expected output

ID CODE  Frequency  ID CODE     Frequency    Final ID Code  Total Frequency
0001         3        0002          3          0001                 3
0002         4        0003          4          0002                 7
0003         2                                 0003                 6

I think that this is possible with vlookup, but I'm not really sure how to go about this.

I apologize for the rudimentary question, look forward to any comments and will provide any answers and clarifications.

eli-k · Accepted Answer

Calculating the total frequencies would be easy in SPSS. After reading the file into SPSS, run the following syntax:
(this assumes variable names will have changed automatically this way: IDCODE Frequency IDCODE_A Frequency_A IDCODE_B Frequency_B etc')

varstocases /make FinalID from IDCODE IDCODE_A IDCODE_B
            /make fr   from Frequency Frequency_A Frequency_B.
dataset name orig.
dataset declare summary.
aggregate /outfile=summary /break=FinalID /TotalFreq=sum(fr).

The original data will now be organized in long format, which is easier to analyse. The summarized frequencies will appear in a new dataset called "summary".

If you have many more data columns in your file, putting all the variable names in the syntax can be a hassle. You can shorten the process by sorting the variables by name in the variable view window and copying them from there.
But if you're going to repeat the process and might have a different number of variables in each run, you should automate the process completely. You can do it like this:

spssinc select variables macroname="!ID" /properties pattern = "IDCODE*".
spssinc select variables macroname="!FRQ" /properties pattern = "Frequency*".

These commands automatically define variable lists which you can now use like this:

varstocases /make FinalID from !ID
            /make fr from !FRQ.

Determining Total Frequency of a Numeric ID with Several ID and Frequency Columns

Answers (2)

Related Questions