Reputation: 418
I have the following script so far that successfully creates a new column in my dataframe and populates it with a sum of how many times the value "TRUE" appears for each row of the dataframe:
data_1 <- data_1 %>% mutate(True_Count = rowSums(across(-c(Community.name), `%in%`, TRUE)))
You will notice after I bring in the across
function, I specify that I want to drop a column from the function. However, I actually don't want to drop any columns from my function. I tried writing something like
across(data_1 %in%, TRUE)
to indicate I want to go across the whole dataframe/all columns, but this is not the correct syntax.
Also, I tried to do this a much simpler way using just rowSums
and no mutate
as follows:
data_1$True_Count <- rowSums(df == TRUE)
but all this did was create an empty column called True_Count
and did not count the occurrences of TRUE
logical values in each row. I also tried the same thing using a random string value that I know occurs exactly one time in my dataset: data_1$True_Count <- rowSums(df == "banana")
but this did the same thing -- it created an empty column and did not count the instance of banana
in my dataset.
Lastly there was one more behavior that I did not understand. If I run the first code, data_1 <- data_1 %>% mutate(True_Count = rowSums(across(-c(Community.name), `%in%`, TRUE)))
more than once, the counts in the True_Count
column cease to be correct.
Upvotes: 0
Views: 241
Reputation: 388972
It is really helpful if you share data in a reproducible format with the expected output so that everyone is on the same page regarding understanding of the question.
Since you did not share an example, I created one myself to explain the answer here. I have added 4 random columns with TRUE
/FALSE
values since it seems this is what your dataset contains.
data_1 <- data.frame(Community.name = c(T, F, T, F, F),
Community.code = c(T, F, F, T, T),
col1 = T,
col2 = c(F, F, T, F, F))
data_1
# Community.name Community.code col1 col2
#1 TRUE TRUE TRUE FALSE
#2 FALSE FALSE TRUE FALSE
#3 TRUE FALSE TRUE TRUE
#4 FALSE TRUE TRUE FALSE
#5 FALSE TRUE TRUE FALSE
Note that TRUE
(logical) is different from "TRUE"
(character). So first verify if your dataset contains logical values or character values before trying out the answers below.
This is your current code where you are dropping Community.name
and calculating number of TRUE
values in the dataset.
library(dplyr)
data_2 <- data_1 %>%
mutate(True_Count = rowSums(across(-c(Community.name), `%in%`, TRUE)))
data_2
# Community.name Community.code col1 col2 True_Count
#1 TRUE TRUE TRUE FALSE 2
#2 FALSE FALSE TRUE FALSE 1
#3 TRUE FALSE TRUE TRUE 2
#4 FALSE TRUE TRUE FALSE 2
#5 FALSE TRUE TRUE FALSE 2
Seems to work as expected. We ignore Community.Name
and calculate number of TRUE
values in the dataset.
Now your question,
I actually don't want to drop any columns from my function.
For that you can use everything()
in across
to include all the columns.
data_3 <- data_1 %>%
mutate(True_Count = rowSums(across(everything(), `%in%`, TRUE)))
data_3
# Community.name Community.code col1 col2 True_Count
#1 TRUE TRUE TRUE FALSE 3
#2 FALSE FALSE TRUE FALSE 1
#3 TRUE FALSE TRUE TRUE 3
#4 FALSE TRUE TRUE FALSE 2
#5 FALSE TRUE TRUE FALSE 2
Also note that everything()
is default in ?across
.
Also, I tried to do this a much simpler way using just rowSums and no mutate
Yes, using rowSums
with no mutate
is much simpler way giving the same answer.
data_1$True_Count <- rowSums(data_1)
data_1
# Community.name Community.code col1 col2 True_Count
#1 TRUE TRUE TRUE FALSE 3
#2 FALSE FALSE TRUE FALSE 1
#3 TRUE FALSE TRUE TRUE 3
#4 FALSE TRUE TRUE FALSE 2
#5 FALSE TRUE TRUE FALSE 2
Lastly there was one more behavior that I did not understand. If I run the first code, more than once, the counts in the True_Count column cease to be correct.
That might be because initially you don't have True_Count
column in the dataset. So for the first time when you run the code True_Count
column is added in your dataset data_1
, now when you run the code second time it also uses True_Count
for calculation which is something you don't want.
Upvotes: 1