Reputation: 679
I need to fill a dataframe in R by matching the values of a vector with column names. The column names do not match exactly the names of the vector. Below is an example:
df = data.frame(matrix(NA, ncol = 3, nrow = 4));
colnames(df) = c("AA", "BB", "CC")
vec1 = c(1,2,3) ; names(vec1) = c("BB745","456AA", "123CC3") ;
vec2 = c(4,5,6) ; names(vec2) = c("BB745","456AA", "123CC3")
The first lines of df
should be
2 1 3
5 4 6
What would be the simpliest way to achieve this?
Thanks in advance!
Upvotes: 0
Views: 864
Reputation: 1237
If you just want a base R approach, something like this could work:
vec1 <- c(1,2,3) ; names(vec1) <- c("BB745","456AA", "123CC3")
vec2 <- c(3,4,5) ; names(vec2) <- c("BB845","545AA", "456CC3")
my_fun <- function(vec) {
headers <- c("AA", "BB", "CC")
new_vals <- sapply(headers,
function(x) {
idx <- grepl(x, names(vec))
vec[idx]
})
names(new_vals) <- headers
new_vals
}
row_list <- lapply(list(vec1, vec2), my_fun)
data.frame(do.call(rbind, row_list))
Output:
AA BB CC
1 2 1 3
2 4 3 5
Upvotes: 1
Reputation: 7818
Based on your example, I suppose that you have many vec
s that you want to set all together in one dataframe based part common parts in their names.
So for example, you have this three vec
s:
vec1 <- setNames(c(1,2,3), c("BB745","456AA", "123CC3"))
vec2 <- setNames(c(3,2,1), c("BB345","113AA", "543CC6"))
vec3 <- setNames(c(2,1,3), c("BB567","AA111", "345CC8"))
And you want to set them together based on the presence of the following strings:
nn <- c("AA", "BB", "CC")
In that case, I suggest you to proceed as follow.
Set up your vec
in one list:
l <- list(vec1,vec2,vec3)
Then use this one line:
library(purrr)
library(stringr)
nn_clps <- paste(nn, collapse = "|")
map_dfr(l, ~setNames(., str_extract(names(.), nn_clps))) %>% select(all_of(nn))
#> # A tibble: 3 x 3
#> AA BB CC
#> <dbl> <dbl> <dbl>
#> 1 2 1 3
#> 2 2 3 1
#> 3 1 2 3
What does it do?
map
from the purrr
packagevec
a time and extract from the names the strings in nn
: so one of AA
, BB
or CC
. To do this we use str_extract
from the stringr
package.setNames
map_dfr
, which act like map
but define the output as one unique dataframe (dfr
stands for bind rows to a dataframe).select
from the dplyr
package. all_of
just translates the strings in names understandable from select
. Also [nn]
instead of %>% select(all_of(nn))
would have worked fine. Like this:map_dfr(l, ~setNames(., str_extract(names(.), nn_clps)))[nn]
Check out ?map
to understand what .
and ~
do.
In short: ~
is needed to create an anonymous function and .
is a placeholder of x
in the anonymous function.
All the code together. I slightly modified it to set up the column in the order you intended.
vec1 <- setNames(c(1,2,3), c("BB745","456AA","123CC3"))
vec2 <- setNames(c(3,2,1), c("BB345","113AA","543CC6"))
vec3 <- setNames(c(2,1,3), c("BB567","AA111","345CC8"))
nn <- c("AA", "BB", "CC")
l <- list(vec1,vec2,vec3)
library(purrr)
library(stringr)
library(dplyr)
nn_clps <- paste(nn, collapse = "|")
map_dfr(l, ~setNames(., str_extract(names(.), nn_clps))) %>% select(all_of(nn))
Upvotes: 1
Reputation: 39585
Try this approach using tidyverse
functions. You can adapt the vector in a dataframe and then clean up the values to make a match using left_join()
. Here the code:
library(tidyverse)
#Code
newdf <- df %>% mutate(Ref=row_number()) %>%
pivot_longer(-Ref) %>%
left_join(
vec1 %>% data.frame() %>% set_names(.,'Val') %>% rownames_to_column('name') %>%
mutate(name=gsub("[[:digit:]]", "", name))
) %>%
mutate(value=Val) %>% select(-Val) %>%
pivot_wider(names_from = name,values_from=value) %>%
select(-Ref)
Output:
# A tibble: 4 x 3
AA BB CC
<dbl> <dbl> <dbl>
1 2 1 3
2 2 1 3
3 2 1 3
4 2 1 3
Upvotes: 1