TakeMeToTheMoon
TakeMeToTheMoon

Reputation: 547

table() on my data is giving counts off-by-one

in R, if I have a data-structure my_data like:

participant var score
`
1           a   ... 
            b   ...
            c   ...
            a   ...
2           b   ...
            a   ...
            c   ...
            c   ...
3           b   ...
            c   ...
            a   ...
            b   ...

and I write the function to count the frequencies of var, through table(my_data$participant, my_data$var), the result is:

   a  b  c
1  1  0  0
2  0  1  0
3  0  1  0

while it should be

   a  b  c
1  2  1  1
2  1  1  2
3  1  2  1

This happens for the reason that the function selects only those lines in which 'participant' is not empty. Is there a default way to tell the software to associate to the same participant those empty lines?

Upvotes: 1

Views: 54

Answers (1)

Florian
Florian

Reputation: 25415

You can use na.locf from the zoo package:

# sample data
my_data = data.frame(participant=c("1","","","2","",""),var = c("a","a","b","a","a","c"),stringsAsFactors = F)

library(zoo)
# first, replace empty elements with NA, then use na.locf
my_data$participant[nchar(my_data$participant)==0]=NA
my_data$participant = na.locf(my_data$participant)
table(my_data$participant, my_data$var)

Output:

    a b c
  1 2 1 0
  2 2 0 1

Hope this helps!

Upvotes: 1

Related Questions