Reputation: 497
My data frame looks like this:
x s1 s2 s3 s4
1 x1 1 1954 1 yes
2 x2 2 1955 1 no
3 x3 1 1976 2 yes
4 x4 2 1954 2 yes
5 x5 3 1943 1 no
Sample data:
df <- data.frame(x=c('x1','x2','x3','x4','x5'),
s1=c(1,2,1,2,3),
s2=c(1954,1955,1976,1954,1943),
s3=c(1,1,2,2,1),
s4=c('yes','no','yes','yes','no'))```
Is it possible to extract the data frame's columns containing integers 1
to 3
? For example, the new data frame would look like:
newdf
x s1 s3
1 x1 1 1
2 x2 2 1
3 x3 1 2
4 x4 2 2
5 x5 3 1
Is it possible to change the s1
and s3
columns to 0 or 1 depending on whether or not the value in the column is 1? The altered data frame would then look like:
newdf2
x s1 s3
1 x1 1 1
2 x2 0 1
3 x3 1 0
4 x4 0 0
5 x5 0 1
Upvotes: 0
Views: 1749
Reputation: 160387
newdf <- df[, unique(c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))), drop = FALSE]
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 2 1
# 3 x3 1 2
# 4 x4 2 2
# 5 x5 3 1
newdf[-1] <- lapply(newdf[-1], function(z) +(z == 1))
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1
Walk-through:
first, we determine which columns are numbers and contain the numbers 1 or 3:
sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z))
# x s1 s2 s3 s4
# FALSE TRUE FALSE TRUE FALSE
This will exclude any column that is not numeric, meaning that a character
column that contains a literal "1"
or "3"
will not be retained. This is complete inference on my end; if you want to accept the string versions then remove the is.numeric(z)
component.
second, we extract the names of those that are true, and prepend "x"
c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))
# [1] "x" "s1" "s3"
wrap that in unique(.)
if, for some reason, "x"
is also numeric and contains 1 or 3 (this step is purely defensive, you may not strictly need it)
select those columns, defensively adding drop=FALSE
so that if only one column is matched, it still returns a full data.frame
replace just those columns (excluding the first column which is "x"
) with 0 or 1; the z == 1
returns logical
, and the wrapping +(..)
converts logical to 0 (false) or 1 (true).
library(dplyr)
df %>%
select(x, where(~ is.numeric(.) & any(c(1, 3) %in% .))) %>%
mutate(across(-x, ~ +(. == 1)))
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1
Upvotes: 2
Reputation: 351
I think this is what you expect :
my_df <- data.frame(x=c('x1','x2','x3','x4','x5'),
s1=c(1,2,1,2,3),
s2=c(1954,1955,1976,1954,1943),
s3=c(1,1,2,2,1),
s4=c('yes','no','yes','yes','no'))
my_df$end <- apply(my_df, 2, function(x) paste(x, collapse = " "))
my_df <- my_df %>% group_by(x) %>% mutate(end2 = paste(str_extract_all(string = end, pattern = "1|2|3", simplify = TRUE), collapse = " "))
my_var <- which(my_df$end == my_df$end2)
my_df[, my_var] <- t(apply(my_df[, my_var], 1, function(x) ifelse(test = x == 1, yes = 1, no = 0)))
my_df <- my_df[, c(1, my_var)]
Upvotes: 0