Reputation: 69
I have a large dataset in SPSS as follows:
Col_1 Col_2 Qty
a x 5
a x 5
a y 2
b x 1
b y 6
b y 7
c y 4
c z 10
c z 11
d x 2
Using the following syntax I aggregate the dataset:
AGGREGATE
/OUTFILE='agg'
/PRESORTED
/BREAK=Col_1 Col_2
/Qty_sum=SUM(Qty).
This is the result I get:
Col_1 Col_2 Qty_sum
a x 10
a y 2
b x 1
b y 13
c y 4
c z 21
d x 2
In addition to this, I want the results to give me a percentage as follows:
Col_1 Col_2 Qty_sum Percent
a x 10 83%
a y 2 17%
b x 1 7%
b y 13 93%
c y 4 16%
c z 21 84%
d x 2 100%
The Percent
column has a percentage that shows how the Qty_sum
is spread across the values in Col_2
.
Since there's no consistency in the way the data in Col_1
and Col_2
are recorded, I cannot use a generic formula for the entire dataset.
I cannot work in Excel
using the Sumifs
function as the dataset is too large for Excel
.
I tried using the Row Percentages
option in the Crosstabs
function, but that works only to get a count across two rows of categorical variables.
Instead of SPSS
, please let me know if this can be done in R
as well.
Upvotes: 1
Views: 436
Reputation: 11350
In SPSS, do this:
AGGREGATE /OUTFILE=* mode=addvariables /BREAK=Col_1 /Qty_sum_col1=SUM(Qty_sum).
compute Col2fraction=Qty_sum/Qty_sum_col1.
The aggregate
command will add a new variable Qty_sum_col1
which will contain the sum of Qty
for every value of col_1
. The following compute
creates Col2fraction
by dividing Qty_sum
by Qty_sum_col1
, resulting in the wanted fraction.
Upvotes: 1
Reputation: 5263
For R, you can use the dplyr
package:
library(dplyr)
DF <- data.frame(
Col_1 = c("a", "a", "a", "b", "b", "b", "c", "c", "c", "d"),
Col_2 = c("x", "x", "y", "x", "y", "y", "y", "z", "z", "x"),
Qty = c( 5, 5, 2, 1, 6, 7, 4, 10, 11, 2)
)
result <- DF %>%
group_by(Col_1, Col_2) %>%
summarise(Qty_Sum = sum(Qty)) %>%
group_by(Col_1) %>%
mutate(Percent = Qty_Sum / sum(Qty_Sum))
Upvotes: 1