Reputation: 157
I have created two dataframes, with df.1
containing my main data.
ID A_ratio B_ratio C_ratio
1 0.9 7.6 3.5
2 3.1 4.4 0.7
3 6.3 8.2 1.2
The dataframe cut
only contains one row.
A_cut B_cut C_cut
4.5 5.3 2.0
I now want to use the values stored in cut
to binarize df
, turning X_ratio <= X_cut
to 1
and X_ratio > X_cut
to 0
. The new column could be called X_bin
. I've tried the following dplyr
approach:
df.2 <- df.1 %>%
mutate(across(ends_with("ratio"), ~if_else(. <= get(cut[str_replace(cur_column(),"ratio","cut")]), 1, 0)
.names = "{.col}_bin"))%>%
rename_with(~str_replace(.,"_ratio",""),contains("_ratio_"))
select(ID, ends_with("bin"))
But I'm unfortunately getting an Error: unexpected symbol
. Could someone point out my mistake? The desired output in df.2
would be
ID A_bin B_bin C_bin
1 1 0 0
2 1 1 1
3 0 0 1
Thanks a lot in advance!
Upvotes: 3
Views: 180
Reputation: 8880
purrr
df <- structure(list(ID = 1:3, A_ratio = c(0.9, 3.1, 6.3), B_ratio = c(7.6,
4.4, 8.2), C_ratio = c(3.5, 0.7, 1.2)), class = "data.frame", row.names = c(NA,
-3L))
cut <- structure(list(A_cut = 4.5, B_cut = 5.3, C_cut = 2), class = "data.frame",
row.names = c(NA,
-1L))
library(purrr)
df[-1] <- +map2_dfc(df[-1], cut, ~.x <= .y)
df
#> ID A_ratio B_ratio C_ratio
#> 1 1 1 0 0
#> 2 2 1 1 1
#> 3 3 0 0 1
Created on 2021-04-02 by the reprex package (v1.0.0)
Upvotes: 2
Reputation: 388862
Base R answer :
df.1[-1] <- +(sweep(df.1[-1], 2, unlist(cut), `<=`))
df.1
# ID A_ratio B_ratio C_ratio
#1 1 1 0 0
#2 2 1 1 1
#3 3 0 0 1
Upvotes: 3
Reputation: 887008
There is a ,
missing before the .names
and if we are extracting the column from cut
, we don't need any get
along with the fact that instead of mutate
, use transmute
to return only those columns needed so that the last step with select
can be removed
library(dplyr)
library(stringr)
df.1 %>%
transmute(ID, across(ends_with("ratio"),
~if_else(. <= cut[[str_replace(cur_column(),"ratio","cut")]],
1, 0),
.names = "{.col}_bin")) %>%
rename_with(~str_replace(.,"_ratio",""),contains("_ratio_"))
-output
# ID A_bin B_bin C_bin
#1 1 1 0 0
#2 2 1 1 1
#3 3 0 0 1
As we are returning binary columns, if_else
is not really needed. The logical vector can be coerced to binary with as.integer
or wrapped with +(
df.1 %>%
transmute(ID, across(ends_with("ratio"),
~as.integer(. <= cut[[str_replace(cur_column(),"ratio","cut")]]),
.names = "{.col}_bin")) %>%
rename_with(~str_replace(.,"_ratio",""),contains("_ratio_"))
Note: cut
is a function name, so it is better not to name objects with function names
df.1 <- structure(list(ID = 1:3, A_ratio = c(0.9, 3.1, 6.3), B_ratio = c(7.6,
4.4, 8.2), C_ratio = c(3.5, 0.7, 1.2)), class = "data.frame", row.names = c(NA,
-3L))
cut <- structure(list(A_cut = 4.5, B_cut = 5.3, C_cut = 2), class = "data.frame",
row.names = c(NA,
-1L))
Upvotes: 3