Reputation: 2081
I imported the following data in which one of the questions was structured like this:
Question a) Type_of_input [MULTIPLE SELECT]
Question b) Total Volume.
In the data.frame, the data is split in a matrix with each of the options being in separate columns and the observations being 0 and 1. 1 is if the option was selected and 0 if the option wasn't. see below a mock up of the data.frame
.
Type_of_input <- c("1:Fertiliser|2:Pesticide|4:Sprayer", "2:Pesticide|3:Herbicides", "2:Pesticide|3:Herbicide|4:Sprayer")
Fertiliser <- c(1,0,0)
Pesticide <- c(1,1,1)
Herbicide <- c(0,1,1)
Sprayer <- c(1,0,1)
total_volume <- c(40,50,60)
df_inputs <- data.frame(Type_of_input, Fertiliser, Pesticide, Herbicide, Sprayer, volume)
df_inputs
Type_of_input Fertiliser Pesticide Herbicide Sprayer total_volume
1 1:Fertiliser|2:Pesticide|4:Sprayer 1 1 0 1 40
2 2:Pesticide|3:Herbicides 0 1 1 0 50
3 2:Pesticide|3:Herbicide|4:Sprayer 0 1 1 1 60
How do I get a frequency table count of each of the inputs and their total_volume
?
Upvotes: 0
Views: 1201
Reputation: 16121
This is one solution, where you calculate the sum for each column of interest
Type_of_input <- c("1:Fertiliser|2:Pesticide|4:Sprayer", "2:Pesticide|3:Herbicides", "2:Pesticide|3:Herbicide|4:Sprayer")
Fertiliser <- c(1,0,0)
Pesticide <- c(1,1,1)
Herbicide <- c(0,1,1)
Sprayer <- c(1,0,1)
df_inputs <- data.frame(Type_of_input, Fertiliser, Pesticide, Herbicide, Sprayer)
library(dplyr)
df_inputs %>%
select(-Type_of_input) %>%
summarise_all(sum)
# Fertiliser Pesticide Herbicide Sprayer
# 1 1 3 2 2
You can have a different format like this
library(tidyverse)
df_inputs %>%
select(-Type_of_input) %>%
summarise_all(sum) %>%
gather(var, value) %>%
arrange(desc(value))
# var value
# 1 Pesticide 3
# 2 Herbicide 2
# 3 Sprayer 2
# 4 Fertiliser 1
in case you want to use value
variable to arrange your dataset and have the most popular values on top.
Upvotes: 1