Reputation: 1118
I am working with two tables:
t1<-data.frame(Name=c("Waldo","Mark","Harold","Earl"),Number=c(1,4,3,9))
and
t2<-data.frame(Whatever=c("does","not","really","matter","at","all"),Waldo=c(0,1,1,0,0,1),Mark=c(1,0,1,1,0,0),Harold=c(0,1,0,0,0,0),Earl=c(1,1,1,1,0,0),Extra=c("another","column","appearing","in","this","table"))
What I would like to do is replace the 1's in t2
with the lookup values from t1
. The column names for t2
appear as records in t1
. All 0 values in t2
should stay intact.
In my true data, there are hundreds of columns in t2
, and hundreds of rows in t1
.
There are also several columns in t2
that should not be impacted by this coding, but should remain in the final output.
Is there a best practice for coding this?
Desired output for the example is below:
Whatever Waldo Mark Harold Earl Extra
does 0 4 0 9 another
not 1 0 3 9 column
really 1 4 0 9 appearing
matter 0 4 0 9 in
at 0 0 0 0 this
all 1 0 0 0 table
Thank you in advance!
Upvotes: 0
Views: 58
Reputation: 16832
Here's a tidyverse
workflow that may be a little excessive for this example, but should scale well for larger datasets. I'll break it into steps so it isn't too convoluted going from wide-shaped data to long to wide again:
First I reshape t2
into a long format and filter for observations with 1:
library(tidyverse)
t2 %>%
gather(key = Name, value = value, -Whatever, -Extra) %>%
filter(value == 1)
#> Whatever Extra Name value
#> 1 not column Waldo 1
#> 2 really appearing Waldo 1
#> 3 all table Waldo 1
#> 4 does another Mark 1
#> 5 really appearing Mark 1
#> 6 matter in Mark 1
#> 7 not column Harold 1
#> 8 does another Earl 1
#> 9 not column Earl 1
#> 10 really appearing Earl 1
#> 11 matter in Earl 1
Then I join that with t1
, using left_join
in case there are any observations in t2
that don't have matches in t1
. That gets me the Number
column from t1
, so now I can drop the value
column from gathering:
t2 %>%
gather(key = Name, value = value, -Whatever, -Extra) %>%
filter(value == 1) %>%
left_join(t1, by = "Name") %>%
select(-value)
#> Whatever Extra Name Number
#> 1 not column Waldo 1
#> 2 really appearing Waldo 1
#> 3 all table Waldo 1
#> 4 does another Mark 4
#> 5 really appearing Mark 4
#> 6 matter in Mark 4
#> 7 not column Harold 3
#> 8 does another Earl 9
#> 9 not column Earl 9
#> 10 really appearing Earl 9
#> 11 matter in Earl 9
Then I use spread
to bring it back to a wide format. Note that these functions create factors to order them, so at the end the spread columns are in alphabetical order. You can use select
to change the order of columns if you need to.
The start to finish process:
t2 %>%
gather(key = Name, value = value, -Whatever, -Extra) %>%
filter(value == 1) %>%
left_join(t1, by = "Name") %>%
select(-value) %>%
spread(key = Name, value = Number, fill = 0)
#> Whatever Extra Earl Harold Mark Waldo
#> 1 all table 0 0 0 1
#> 2 does another 9 0 4 0
#> 3 matter in 9 0 4 0
#> 4 not column 9 3 0 1
#> 5 really appearing 9 0 4 1
Created on 2018-08-14 by the reprex package (v0.2.0).
Upvotes: 1
Reputation: 2856
This should be flexible enough for your actual dataset:
my_function <- function(df, lookup) {
for(i in names(df)) {
df[[as.character(i)]][df[[as.character(i)]] == 1] <- lookup$Number[lookup$Name == as.character(i)]
}
return(df)
}
my_function(t2, t1)
# Whatever Waldo Mark Harold Earl Extra
# 1 does 0 4 0 9 another
# 2 not 1 0 3 9 column
# 3 really 1 4 0 9 appearing
# 4 matter 0 4 0 9 in
# 5 at 0 0 0 0 this
# 6 all 1 0 0 0 table
Upvotes: 1