juahan
juahan

Reputation: 33

Frequencies and one variable factor frequencies to columns in R

I'm trying to format my data in R so that I can then use it properly for different general linear models.

The data is like this:

> str(data)
'data.frame':   1978 obs. of  7 variables:
 $ country                : Factor w/ 22 levels "AT","BE","CH",..: 8 8 8 8 8 8 8 8 8 8 ...
 $ age                    : num  65 77 36 28 23 15 75 20 44 73 ...
 $ gender                 : Factor w/ 2 levels "male","female": 2 1 1 1 2 2 1 2 2 1 ...
 $ education_level        : Factor w/ 6 levels "less_than_lower_sec",..: 5 1 3 5 5 2 1 3 3 5 ...
 $ good_citizen_importance: Factor w/ 11 levels "00","01","02",..: 11 9 9 9 10 10 7 10 10 9 ...
 $ trade                  : Factor w/ 7 levels "none_apply","member",..: 2 4 4 2 2 4 4 4 2 4 ...
 $ relig                  : Factor w/ 7 levels "none_apply","member",..: 2 2 4 4 4 4 2 5 4 4 ...

Snippet from the data itself:

> head(data)
      country age gender     education_level good_citizen_importance   trade   relig
13711      FI  65 female            tertiary                      10  member  member
13712      FI  77   male less_than_lower_sec                      08 donated  member
13713      FI  36   male           upper_sec                      08 donated donated
13714      FI  28   male            tertiary                      08  member donated
13715      FI  23 female            tertiary                      09  member donated
13716      FI  15 female           lower_sec                      09 donated donated

And I have managed to do this kind of frequency counts, which means that I'm almost there. But I would like to get all the factors and associated counts of "good_citizen_importance" variable to columns.

> counts <- count(data, c("good_citizen_importance", "trade", "relig", "gender"))
> head(counts)
  good_citizen_importance   trade   relig gender freq
1                      00 donated  member   male    1
2                      00 donated donated   male    1
3                      01  member donated female    1
4                      01 donated donated   male    2
5                      01 donated donated female    1
6                      02  member  member female    1  

This is how I would like to have the data:

> head(counts)
    trade   relig   gender "00" "01" "02" ...
1   donated member  male     1    5    7   ...
2   donated donated male     12   2    3   ...
3   member  donated female   11   3    1   ...
4   donated donated male     25   1    4   ...
5   donated donated female   12   1    1   ...
6   member  member  female   11   1    1   ...

So I would like to have the factor frequency for all factors for one variable with the combinations on the other variables. In other words, frequency column for all the 11 factors of the "good_citizen_importance" variable.

I'm sure this is not very hard problem, but I have been fighting this already several hours and I think I have exhausted my R and Google skills right about now.

Upvotes: 1

Views: 52

Answers (1)

Melissa Key
Melissa Key

Reputation: 4551

This can be accomplished by reshaping the data. In base R, the function reshape can be used, but the syntax is awkward (I used to use it regularly, and I'd have to look up the syntax EVERY time). A better solution is spread in the tidyverse suite of packages (specifically, it's in the tidyr package:

library(tidyr) # or library(tidyverse)
counts_wide <- counts %>% 
  spread(good_citizen_importance, freq, fill = 0)

If you aren't familiar with the pipe operator (%>%), it takes the output of the previous function and sets it as the first argument of the next function. It's used to make the code easier to read by removing lots of nested functions.

Upvotes: 1

Related Questions