andy
andy

Reputation: 2081

Analysing Multiple Choice Values in R - Getting Frequency count of Variables

I imported the following data in which one of the questions was structured like this:

Question a) Type_of_input [MULTIPLE SELECT]

  1. 1:Fertiliser
  2. 2:Pesticide
  3. 3:Herbicide
  4. 4:Sprayer

Question b) Total Volume.

In the data.frame, the data is split in a matrix with each of the options being in separate columns and the observations being 0 and 1. 1 is if the option was selected and 0 if the option wasn't. see below a mock up of the data.frame.

Type_of_input <- c("1:Fertiliser|2:Pesticide|4:Sprayer", "2:Pesticide|3:Herbicides", "2:Pesticide|3:Herbicide|4:Sprayer")
Fertiliser <- c(1,0,0)
Pesticide <- c(1,1,1)
Herbicide <- c(0,1,1)
Sprayer <- c(1,0,1)
total_volume <- c(40,50,60)
df_inputs <- data.frame(Type_of_input, Fertiliser, Pesticide, Herbicide, Sprayer, volume)

df_inputs

                       Type_of_input Fertiliser Pesticide Herbicide Sprayer total_volume
1 1:Fertiliser|2:Pesticide|4:Sprayer          1         1         0       1           40
2           2:Pesticide|3:Herbicides          0         1         1       0           50
3  2:Pesticide|3:Herbicide|4:Sprayer          0         1         1       1           60

How do I get a frequency table count of each of the inputs and their total_volume?

Upvotes: 0

Views: 1201

Answers (2)

Andre Elrico
Andre Elrico

Reputation: 11480

You simply do:

sapply(df_inputs[-1],sum)

Upvotes: 1

AntoniosK
AntoniosK

Reputation: 16121

This is one solution, where you calculate the sum for each column of interest

Type_of_input <- c("1:Fertiliser|2:Pesticide|4:Sprayer", "2:Pesticide|3:Herbicides", "2:Pesticide|3:Herbicide|4:Sprayer")
Fertiliser <- c(1,0,0)
Pesticide <- c(1,1,1)
Herbicide <- c(0,1,1)
Sprayer <- c(1,0,1)
df_inputs <- data.frame(Type_of_input, Fertiliser, Pesticide, Herbicide, Sprayer)

library(dplyr)

df_inputs %>%
  select(-Type_of_input) %>%
  summarise_all(sum)

#    Fertiliser Pesticide Herbicide Sprayer
# 1          1         3         2       2

You can have a different format like this

library(tidyverse)

df_inputs %>%
  select(-Type_of_input) %>%
  summarise_all(sum) %>%
  gather(var, value) %>%
  arrange(desc(value))

#          var value
# 1  Pesticide     3
# 2  Herbicide     2
# 3    Sprayer     2
# 4 Fertiliser     1

in case you want to use value variable to arrange your dataset and have the most popular values on top.

Upvotes: 1

Related Questions