Reputation: 65
I have a large data set of 1000 rows and 10000 columns. I give a short example with the following code:
V1 = c(1,2,1,1,NA,1,0,2,2,1,0)
V2 = c(0,2,NA,2,1,2,0,1,NA,0,0)
V3 = c(3,1,1,1,NA,1,0,2,2,1,2)
V4 = c(1,2,NA,1,0,2,0,2,NA,2,0)
datos =data.frame(cbind(V1,V2,V3,V4))
Each of my columns has values = "0", "1" or "2". I need to get the frequency of the "0" and the "1" in each column. Note that the data set also have "NA" values that I do not have to take into account to estimate the frequencies.
For example for V1 the frequency of "0" is = 2/10 = 0,2, and the frequency of "1" is = 5/10 = 0,5. So the sum is 0,2 + 0,5 = 0,7.
I think that apply(datos, 2, x)
could help, but I am not sure how could be the function (x)
.
Does anyone know a way to solve this?
Thanks
Upvotes: 1
Views: 532
Reputation: 101343
Here is a base R solution
res <- Map(function(x) sum(head(prop.table(table(na.omit(x))),2)),datos)
such that
> res
$V1
[1] 0.7
$V2
[1] 0.6666667
$V3
[1] 0.6
$V4
[1] 0.5555556
Upvotes: 0
Reputation: 39858
One dplyr
and purrr
option could be:
map(.x = list(zeros = 0, ones = 1),
~ summarise_all(datos, ~ sum(. == !!.x, na.rm = TRUE)/length(na.omit(.))))
$zeros
V1 V2 V3 V4
1 0.2 0.4444444 0.1 0.3333333
$ones
V1 V2 V3 V4
1 0.5 0.2222222 0.5 0.2222222
If you want also a combined value for the two:
map(.x = list(zeros = 0, ones = 1),
~ summarise_all(datos, ~ sum(. == !!.x, na.rm = TRUE)/length(na.omit(.)))) %>%
reduce(`+`)
V1 V2 V3 V4
1 0.7 0.6666667 0.6 0.5555556
Or using just dplyr
if you are looking for combined values:
datos %>%
summarise_all(~ sum(. %in% c(0, 1), na.rm = TRUE)/length(na.omit(.)))
Upvotes: 1
Reputation: 887118
We can replace
the values other than 0 or 1 to NA, get the frequency with table
and convert to proportion with prop.table
sapply(datos, function(x) prop.table(table(replace(x, !x %in% 0:1, NA))))
If we need to consider the whole length i.e number of rows of original dataset
sapply(datos, function(x) table(replace(x, !x %in% 0:1, NA)))/nrow(datos)
Upvotes: 3