Reputation: 329
I'm new to this platform and quite new to R, so if you see any improvements in posting my questions, please feel free to advise me.
I'm currently working on evaluating a survey using R.
Question: What is the best way to compare the values in all but the first columns of a Data Frame to a vector and count the number of occurrences of the variables in the vector?
In Excel I would use countif for my problem, in R I tried the combination of different filter, group_by and loop functions - unfortunately with no success. I searched Stack Overflow for a solution, but didn't find one so far. Maybe I looked with the wrong keywords.
Example:
df<-data.frame(c("A","B","A","B","C"),c(-1,1,3,2,3),c(-3,-1,2,-1,2))
colnames(df)<- c("Participant-Type","Item 1","Item 2")
Participant-Type Item 1 Item 2
1 A -1 -3
2 B 1 -1
3 A 3 2
4 B 2 -1
5 C 3 2
vec <- c(-3,-2,-1,0,1,2,3)
[1] -3 -2 -1 0 1 2 3
I want to query the Data Frame to know how many times the different Participant-Groups choose the values in the Vector for Item 1 and Item 2. The Result should look something like the following for Participant-Type A:
TypeA <- data.frame(c("Item 1","Item 2"),c(0,1), c(0,0),c(1,0),c(0,0),c(0,0),c(0,1),c(1,0), row.names=1)
colnames(TypeA)<-c("-3","-2","-1","0","1","2","3")
-3 -2 -1 0 1 2 3
Item 1 0 0 1 0 0 0 1
Item 2 1 0 0 0 0 1 0
In the original Data-Set there are also a lot of NA values.
I know how to select only the rows from a certain Participant-Type. I also found, that table(df$Item1) gives me the count of the variables of Column Item 1 in the Data Frame, but values which where not used are not part of the result.
I assume that there is no solution to query the data frame as a whole, so that I must go Column by Column and merge the results to a DataFrame.
But so far I didn't find out how to compare the Column with the vector respectively how to get the count for the variables of the Vector per Column of the DataFrame. Counting the unique Values per Column would show any count (= 0) for the variables in the Vector which were not used by the Participants.
Any help or advise is really appreciated!
Edit:
The original DataFrame has 75 observations (Rows) of 22 variables (Columns), if that is relevant.
Upvotes: 2
Views: 684
Reputation: 79338
In base R you could do:
grps <- interaction(df$`Participant-Type`, col(df[-1]))# Create the groupings
use_vec <- factor(unlist(df[-1]), vector) # Unlist your data
do.call(rbind, tapply(use_vec, grps, table)) #Resulting table
-3 -2 -1 0 1 2 3
A.1 0 0 1 0 0 0 1
B.1 0 0 0 0 1 1 0
C.1 0 0 0 0 0 0 1
A.2 1 0 0 0 0 1 0
B.2 0 0 2 0 0 0 0
C.2 0 0 0 0 0 1 0
Upvotes: 1
Reputation: 40171
One dplyr
and tidyr
option could be:
df %>%
pivot_longer(-1) %>%
count(name, value) %>%
complete(value = vec, nesting(name), fill = list(n = 0)) %>%
pivot_wider(names_from = "value", values_from = "n")
name `-3` `-2` `-1` `0` `1` `2` `3`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Item 1 0 0 1 0 1 1 2
2 Item 2 1 0 2 0 0 2 0
Or if aggregation at Participant-Type is needed:
df %>%
pivot_longer(-1) %>%
count(name, `Participant-Type`, value) %>%
complete(value = vec, nesting(name, `Participant-Type`), fill = list(n = 0)) %>%
pivot_wider(names_from = "value", values_from = "n")
name `Participant-Type` `-3` `-2` `-1` `0` `1` `2` `3`
<chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Item 1 A 0 0 1 0 0 0 1
2 Item 1 B 0 0 0 0 1 1 0
3 Item 1 C 0 0 0 0 0 0 1
4 Item 2 A 1 0 0 0 0 1 0
5 Item 2 B 0 0 2 0 0 0 0
6 Item 2 C 0 0 0 0 0 1 0
Upvotes: 3