marb_021
marb_021

Reputation: 65

How to get the frequency of specific values?

I have a large data set of 1000 rows and 10000 columns. I give a short example with the following code:

V1 = c(1,2,1,1,NA,1,0,2,2,1,0)
V2 = c(0,2,NA,2,1,2,0,1,NA,0,0)
V3 = c(3,1,1,1,NA,1,0,2,2,1,2)
V4 = c(1,2,NA,1,0,2,0,2,NA,2,0)
datos =data.frame(cbind(V1,V2,V3,V4))

Each of my columns has values = "0", "1" or "2". I need to get the frequency of the "0" and the "1" in each column. Note that the data set also have "NA" values that I do not have to take into account to estimate the frequencies.

For example for V1 the frequency of "0" is = 2/10 = 0,2, and the frequency of "1" is = 5/10 = 0,5. So the sum is 0,2 + 0,5 = 0,7.

I think that apply(datos, 2, x) could help, but I am not sure how could be the function (x).

Does anyone know a way to solve this?

Thanks

Upvotes: 1

Views: 532

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101343

Here is a base R solution

res <- Map(function(x) sum(head(prop.table(table(na.omit(x))),2)),datos)

such that

> res
$V1
[1] 0.7

$V2
[1] 0.6666667

$V3
[1] 0.6

$V4
[1] 0.5555556

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 39858

One dplyr and purrr option could be:

map(.x = list(zeros = 0, ones = 1), 
    ~ summarise_all(datos, ~ sum(. == !!.x, na.rm = TRUE)/length(na.omit(.))))

$zeros
   V1        V2  V3        V4
1 0.2 0.4444444 0.1 0.3333333

$ones
   V1        V2  V3        V4
1 0.5 0.2222222 0.5 0.2222222

If you want also a combined value for the two:

map(.x = list(zeros = 0, ones = 1), 
    ~ summarise_all(datos, ~ sum(. == !!.x, na.rm = TRUE)/length(na.omit(.)))) %>%
 reduce(`+`)

   V1        V2  V3        V4
1 0.7 0.6666667 0.6 0.5555556

Or using just dplyr if you are looking for combined values:

datos %>%
 summarise_all(~ sum(. %in% c(0, 1), na.rm = TRUE)/length(na.omit(.)))

Upvotes: 1

akrun
akrun

Reputation: 887118

We can replace the values other than 0 or 1 to NA, get the frequency with table and convert to proportion with prop.table

sapply(datos, function(x) prop.table(table(replace(x, !x %in% 0:1, NA))))

If we need to consider the whole length i.e number of rows of original dataset

sapply(datos, function(x) table(replace(x, !x %in% 0:1, NA)))/nrow(datos)

Upvotes: 3

Related Questions