Reputation: 39
I have two dataframes (df1 & df2) with the same dimensions. df1 contains numerical values:
df1
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 11 21 31 41 51 61 71 81 91
2 2 12 22 32 42 52 62 72 82 92
3 3 13 23 33 43 53 63 73 83 93
4 4 14 24 34 44 54 64 74 84 94
5 5 15 25 35 45 55 65 75 85 95
6 6 16 26 36 46 56 66 76 86 96
7 7 17 27 37 47 57 67 77 87 97
8 8 18 28 38 48 58 68 78 88 98
9 9 19 29 39 49 59 69 79 89 99
10 10 20 30 40 50 60 70 80 90 100
and df2 contains strings:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 today today today today yesterday today today today today tomorrow,today
2 today today today today today yesterday today today today today
3 today yesterday today today today today today yesterday today today
4 today today today today today today today today tomorrow,today today
5 today today today yesterday yesterday today today today today tomorrow
6 today today today yesterday today tomorrow,today today today today today
7 today today today today today yesterday today today today tomorrow,today
8 today yesterday today today today today today today today today
9 today today today today today today today today tomorrow,today today
10 today yesterday today yesterday today today today yesterday today today
please note that some cells contain more than one string (e.g. [1,V10])
Each cell of df1 corresponds to a cell in df2 (df1[1,1] <-> df2[1,1], df1[2,2] <-> df2[2,2], etc.).
Is there a smart way (preferably without using a for-loop) to only keep values in df1, where the string from the corresponding cell in df2 contains "today" and the rest is set to NA?
So the prefered output would be:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 11 21 31 NA 51 61 71 81 91
2 2 12 22 32 42 NA 62 72 82 92
3 3 NA 23 33 43 53 63 NA 83 93
4 4 14 24 34 44 54 64 74 84 94
5 5 15 25 NA NA 55 65 75 85 NA
6 6 16 26 NA 46 56 66 76 86 96
7 7 17 27 37 47 NA 67 77 87 97
8 8 NA 28 38 48 58 68 78 88 98
9 9 19 29 39 49 59 69 79 89 99
10 10 NA 30 NA 50 60 70 NA 90 100
Thanks in advance!
Upvotes: 0
Views: 37
Reputation: 886938
We could use
replace(df1, !sapply(df2, grepl, pattern = "today"), NA)
-output
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 11 21 31 NA 51 61 71 81 91
2 2 12 22 32 42 NA 62 72 82 92
3 3 NA 23 33 43 53 63 NA 83 93
4 4 14 24 34 44 54 64 74 84 94
5 5 15 25 NA NA 55 65 75 85 NA
6 6 16 26 NA 46 56 66 76 86 96
7 7 17 27 37 47 NA 67 77 87 97
8 8 NA 28 38 48 58 68 78 88 98
9 9 19 29 39 49 59 69 79 89 99
10 10 NA 30 NA 50 60 70 NA 90 100
Or slightly more compact
df1 * NA^!grepl("today", as.matrix(df2))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 11 21 31 NA 51 61 71 81 91
2 2 12 22 32 42 NA 62 72 82 92
3 3 NA 23 33 43 53 63 NA 83 93
4 4 14 24 34 44 54 64 74 84 94
5 5 15 25 NA NA 55 65 75 85 NA
6 6 16 26 NA 46 56 66 76 86 96
7 7 17 27 37 47 NA 67 77 87 97
8 8 NA 28 38 48 58 68 78 88 98
9 9 19 29 39 49 59 69 79 89 99
10 10 NA 30 NA 50 60 70 NA 90 100
df1 <- structure(list(V1 = 1:10, V2 = 11:20, V3 = 21:30, V4 = 31:40,
V5 = 41:50, V6 = 51:60, V7 = 61:70, V8 = 71:80, V9 = 81:90,
V10 = 91:100), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"))
df2 <- structure(list(V1 = c("today", "today", "today", "today", "today",
"today", "today", "today", "today", "today"), V2 = c("today",
"today", "yesterday", "today", "today", "today", "today", "yesterday",
"today", "yesterday"), V3 = c("today", "today", "today", "today",
"today", "today", "today", "today", "today", "today"), V4 = c("today",
"today", "today", "today", "yesterday", "yesterday", "today",
"today", "today", "yesterday"), V5 = c("yesterday", "today",
"today", "today", "yesterday", "today", "today", "today", "today",
"today"), V6 = c("today", "yesterday", "today", "today", "today",
"tomorrow,today", "yesterday", "today", "today", "today"), V7 = c("today",
"today", "today", "today", "today", "today", "today", "today",
"today", "today"), V8 = c("today", "today", "yesterday", "today",
"today", "today", "today", "today", "today", "yesterday"), V9 = c("today",
"today", "today", "tomorrow,today", "today", "today", "today",
"today", "tomorrow,today", "today"), V10 = c("tomorrow,today",
"today", "today", "today", "tomorrow", "today", "tomorrow,today",
"today", "today", "today")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
Upvotes: 1
Reputation: 16836
Here is another option using a combination of tidyverse
and base R:
library(tidyverse)
df1[!mutate(df2, across(everything(), ~ str_detect(.x, "today")))] <- NA
Output
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 11 21 31 NA 51 61 71 81 91
2 2 12 22 32 42 NA 62 72 82 92
3 3 NA 23 33 43 53 63 NA 83 93
4 4 14 24 34 44 54 64 74 84 94
5 5 15 25 NA NA 55 65 75 85 NA
6 6 16 26 NA 46 56 66 76 86 96
7 7 17 27 37 47 NA 67 77 87 97
8 8 NA 28 38 48 58 68 78 88 98
9 9 19 29 39 49 59 69 79 89 99
10 10 NA 30 NA 50 60 70 NA 90 100
Data
df1 <- structure(list(V1 = 1:10, V2 = 11:20, V3 = 21:30, V4 = 31:40,
V5 = 41:50, V6 = 51:60, V7 = 61:70, V8 = 71:80, V9 = 81:90,
V10 = 91:100), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"))
df2 <- structure(list(V1 = c("today", "today", "today", "today", "today",
"today", "today", "today", "today", "today"), V2 = c("today",
"today", "yesterday", "today", "today", "today", "today", "yesterday",
"today", "yesterday"), V3 = c("today", "today", "today", "today",
"today", "today", "today", "today", "today", "today"), V4 = c("today",
"today", "today", "today", "yesterday", "yesterday", "today",
"today", "today", "yesterday"), V5 = c("yesterday", "today",
"today", "today", "yesterday", "today", "today", "today", "today",
"today"), V6 = c("today", "yesterday", "today", "today", "today",
"tomorrow,today", "yesterday", "today", "today", "today"), V7 = c("today",
"today", "today", "today", "today", "today", "today", "today",
"today", "today"), V8 = c("today", "today", "yesterday", "today",
"today", "today", "today", "today", "today", "yesterday"), V9 = c("today",
"today", "today", "tomorrow,today", "today", "today", "today",
"today", "tomorrow,today", "today"), V10 = c("tomorrow,today",
"today", "today", "today", "tomorrow", "today", "tomorrow,today",
"today", "today", "today")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
Upvotes: 1