Achyutha Mohan
Achyutha Mohan

Reputation: 69

How can I get a percentages column when I aggregate a large dataset in SPSS?

I have a large dataset in SPSS as follows:

Col_1   Col_2   Qty
a         x      5
a         x      5
a         y      2
b         x      1
b         y      6
b         y      7 
c         y      4
c         z     10
c         z     11
d         x      2

Using the following syntax I aggregate the dataset:

AGGREGATE
  /OUTFILE='agg'
  /PRESORTED
  /BREAK=Col_1 Col_2
  /Qty_sum=SUM(Qty).

This is the result I get:

Col_1   Col_2   Qty_sum
a          x    10
a          y    2
b          x    1
b          y    13
c          y    4
c          z    21
d          x    2

In addition to this, I want the results to give me a percentage as follows:

Col_1   Col_2   Qty_sum Percent
a          x    10        83%
a          y    2         17%
b          x    1         7%
b          y    13        93%
c          y    4         16%
c          z    21        84%
d          x    2         100%

The Percent column has a percentage that shows how the Qty_sum is spread across the values in Col_2.

Since there's no consistency in the way the data in Col_1 and Col_2 are recorded, I cannot use a generic formula for the entire dataset.

I cannot work in Excel using the Sumifs function as the dataset is too large for Excel.

I tried using the Row Percentages option in the Crosstabs function, but that works only to get a count across two rows of categorical variables.

Instead of SPSS, please let me know if this can be done in R as well.

Upvotes: 1

Views: 436

Answers (2)

eli-k
eli-k

Reputation: 11350

In SPSS, do this:

AGGREGATE /OUTFILE=* mode=addvariables /BREAK=Col_1 /Qty_sum_col1=SUM(Qty_sum).
compute Col2fraction=Qty_sum/Qty_sum_col1.

The aggregate command will add a new variable Qty_sum_col1 which will contain the sum of Qty for every value of col_1. The following compute creates Col2fraction by dividing Qty_sum by Qty_sum_col1, resulting in the wanted fraction.

Upvotes: 1

Nathan Werth
Nathan Werth

Reputation: 5263

For R, you can use the dplyr package:

library(dplyr)    

DF <- data.frame(
  Col_1 = c("a", "a", "a", "b", "b", "b", "c", "c", "c", "d"),
  Col_2 = c("x", "x", "y", "x", "y", "y", "y", "z", "z", "x"),
  Qty   = c(  5,   5,   2,   1,   6,   7,   4,  10,  11,   2)
)

result <- DF %>%
  group_by(Col_1, Col_2) %>%
  summarise(Qty_Sum = sum(Qty)) %>%
  group_by(Col_1) %>%
  mutate(Percent = Qty_Sum / sum(Qty_Sum))

Upvotes: 1

Related Questions