Reputation: 189
I apologize for the confusing title, I've added an example to clarify. I believe this is actually pretty easy, but I just can't for the life of me determine how to do it. Essentially I have long lists of IDs and their corresponding frequencies, around 45 lists all very long. I want to determine the total frequency of each ID, but the issue is that each ID is not in each list. I tried importing the data into access, but the operation was too complicated and resulted in errors in access.
I would be able to do this in SPSS if there were a discrete number of IDs for If statements, but in total there are around 10,000 IDs necessitating a way to do this quickly.
The data looks like this, with 50 ID Code Columns with an associated frequency column.
ID CODE Frequency ID CODE Frequency
0001 3 0002 3
0002 4 0003 4
0003 2
Expected output
ID CODE Frequency ID CODE Frequency Final ID Code Total Frequency
0001 3 0002 3 0001 3
0002 4 0003 4 0002 7
0003 2 0003 6
I think that this is possible with vlookup, but I'm not really sure how to go about this.
I apologize for the rudimentary question, look forward to any comments and will provide any answers and clarifications.
Upvotes: 1
Views: 128
Reputation: 11360
Calculating the total frequencies would be easy in SPSS. After reading the file into SPSS, run the following syntax:
(this assumes variable names will have changed automatically this way: IDCODE Frequency IDCODE_A Frequency_A IDCODE_B Frequency_B etc')
varstocases /make FinalID from IDCODE IDCODE_A IDCODE_B
/make fr from Frequency Frequency_A Frequency_B.
dataset name orig.
dataset declare summary.
aggregate /outfile=summary /break=FinalID /TotalFreq=sum(fr).
The original data will now be organized in long format, which is easier to analyse. The summarized frequencies will appear in a new dataset called "summary".
If you have many more data columns in your file, putting all the variable names in the syntax can be a hassle. You can shorten the process by sorting the variables by name in the variable view window and copying them from there.
But if you're going to repeat the process and might have a different number of variables in each run, you should automate the process completely. You can do it like this:
spssinc select variables macroname="!ID" /properties pattern = "IDCODE*".
spssinc select variables macroname="!FRQ" /properties pattern = "Frequency*".
These commands automatically define variable lists which you can now use like this:
varstocases /make FinalID from !ID
/make fr from !FRQ.
Upvotes: 2
Reputation: 892
Try using the SUMIFS function. Here is what I have in cell F2:
`=SUMIFS($B$2:$B$4,$A$2:$A$4,E2) + SUMIFS($D$2:$D$4,$C$2:$C$4,E2)`
I then copied it down to the cells below. The E2 became E3 in the 2nd data row and then E4 was in the 3rd data row.
Hopefully this example will give you an idea of how it works. You'll probably have to do some adjusting to account for more rows, and if you don't have the same columns A-F like I do.
Upvotes: 1