Reputation: 123
I wasn´t quite sure how to search for the topic I´m interested in, so I apologize in advance if this question has already been asked. Questions related to frequency table didn´t solve my doubt.
I have the following df, where 1
indicates a positive results and 2
a negative ones:
d1 <- data.frame( Household = c(1:5), State = c("AL","AL","AL","MI","MI"), Electricity = c(1,1,1,2,2),
Fuelwood = c(2,2,1,1,1))
I want to produce a frequency table where I can identify the percentage of people using Eletricity, Fuelwood and Electricity+Fuelwood, such as df2
:
d2 <- data.frame (State = c("AL", "MI"), Electricity = c(66.6,0), Fuelwood = c(0,100), ElectricityANDFuelwood = c(33.3,0))
Please consider that my real df has approx. 42 k households, 5 energy sources and 27 states.
Upvotes: 3
Views: 622
Reputation: 887691
We can look for rows in d1
where Electricity
and Fuelwood
are positive (1
). Using that logical index, we can change the values in Electricity
and Fuelwood
rows that are both positive to negative or 2
. Then, create an additional column ElecticityANDFuelwood
using the index
that was created. Change from wide
to long
form using melt
, subset only the two columns State
and variable
, use table
and prop.table
to calculate the frequency and relative frequency.
indx <- with(d1, Electricity==1 & Fuelwood==1)
d1[indx,3:4] <- 2
dT <- transform(d1, ElectricityANDFuelwood= (indx)+0)[-1]
library(reshape2)
dT1 <- subset(melt(dT, id.var='State'), value==1, select=1:2)
round(100*prop.table(table(dT1), margin=1),2)
# variable
#State Electricity Fuelwood ElectricityANDFuelwood
# AL 66.67 0.00 33.33
# MI 0.00 100.00 0.00
Or a data.table
solution contributed by @David Arenburg
library(data.table)
d2 <- as.data.table(d1[-1])[, ElectricityANDFuelwood :=
(Electricity == 1 & Fuelwood == 1)]
d2[(ElectricityANDFuelwood), (2:3) := 2]
d2[, lapply(.SD, function(x) 100*sum(x == 1)/.N), by = State]
# State Electricity Fuelwood ElectricityANDFuelwood
#1: AL 66.66667 0 33.33333
#2: MI 0.00000 100 0.00000
Upvotes: 4