Flora Grappelli
Flora Grappelli

Reputation: 679

Fill a dataframe by matching the values of a vector with colnames

I need to fill a dataframe in R by matching the values of a vector with column names. The column names do not match exactly the names of the vector. Below is an example:

df = data.frame(matrix(NA, ncol = 3, nrow = 4));
colnames(df) = c("AA", "BB", "CC")
vec1 = c(1,2,3) ; names(vec1) = c("BB745","456AA",  "123CC3") ;
vec2 = c(4,5,6) ; names(vec2) = c("BB745","456AA",  "123CC3") 

The first lines of df should be

2 1 3
5 4 6

What would be the simpliest way to achieve this?

Thanks in advance!

Upvotes: 0

Views: 864

Answers (3)

henryn
henryn

Reputation: 1237

If you just want a base R approach, something like this could work:

vec1 <- c(1,2,3) ; names(vec1) <- c("BB745","456AA", "123CC3") 
vec2 <- c(3,4,5) ; names(vec2) <- c("BB845","545AA", "456CC3") 

my_fun <- function(vec) {
  headers <- c("AA", "BB", "CC")
  new_vals <- sapply(headers, 
                function(x) {
                  idx <- grepl(x, names(vec))
                  vec[idx]
                })
  names(new_vals) <- headers
  new_vals
}
row_list <- lapply(list(vec1, vec2), my_fun)
data.frame(do.call(rbind, row_list))

Output:

  AA BB CC
1  2  1  3
2  4  3  5

Upvotes: 1

Edo
Edo

Reputation: 7818

Based on your example, I suppose that you have many vecs that you want to set all together in one dataframe based part common parts in their names.

So for example, you have this three vecs:

vec1 <- setNames(c(1,2,3), c("BB745","456AA",  "123CC3"))
vec2 <- setNames(c(3,2,1), c("BB345","113AA",  "543CC6"))
vec3 <- setNames(c(2,1,3), c("BB567","AA111",  "345CC8"))

And you want to set them together based on the presence of the following strings:

nn <-  c("AA", "BB", "CC")

In that case, I suggest you to proceed as follow.

Set up your vec in one list:

l <- list(vec1,vec2,vec3)

Then use this one line:

library(purrr)
library(stringr)

nn_clps <- paste(nn, collapse = "|")
map_dfr(l, ~setNames(., str_extract(names(.), nn_clps))) %>% select(all_of(nn))

#> # A tibble: 3 x 3
#>      AA    BB    CC
#>   <dbl> <dbl> <dbl>
#> 1     2     1     3
#> 2     2     3     1
#> 3     1     2     3

What does it do?

  • We apply a function to each vector with map from the purrr package
  • In this way, we take one vec a time and extract from the names the strings in nn: so one of AA, BB or CC. To do this we use str_extract from the stringr package.
  • Then we rename each vector with the strings we extracted. We do this with setNames
  • At the end, all the strings are set together in one unique dataframe. This is performed by map_dfr, which act like map but define the output as one unique dataframe (dfr stands for bind rows to a dataframe).
  • to set up the columns in the intended orders, there are many ways. One could be with select from the dplyr package. all_of just translates the strings in names understandable from select. Also [nn] instead of %>% select(all_of(nn)) would have worked fine. Like this:
map_dfr(l, ~setNames(., str_extract(names(.), nn_clps)))[nn]

Check out ?map to understand what . and ~ do. In short: ~ is needed to create an anonymous function and . is a placeholder of x in the anonymous function.


All the code together. I slightly modified it to set up the column in the order you intended.

vec1 <- setNames(c(1,2,3), c("BB745","456AA","123CC3"))
vec2 <- setNames(c(3,2,1), c("BB345","113AA","543CC6"))
vec3 <- setNames(c(2,1,3), c("BB567","AA111","345CC8"))
nn <- c("AA", "BB", "CC")
l <- list(vec1,vec2,vec3)
library(purrr)
library(stringr)
library(dplyr)
nn_clps <- paste(nn, collapse = "|")
map_dfr(l, ~setNames(., str_extract(names(.), nn_clps))) %>% select(all_of(nn))

Upvotes: 1

Duck
Duck

Reputation: 39585

Try this approach using tidyverse functions. You can adapt the vector in a dataframe and then clean up the values to make a match using left_join(). Here the code:

library(tidyverse)
#Code
newdf <- df %>% mutate(Ref=row_number()) %>%
  pivot_longer(-Ref) %>%
  left_join(
    vec1 %>% data.frame() %>% set_names(.,'Val') %>% rownames_to_column('name') %>%
      mutate(name=gsub("[[:digit:]]", "", name))
  ) %>%
  mutate(value=Val) %>% select(-Val) %>%
  pivot_wider(names_from = name,values_from=value) %>%
  select(-Ref)

Output:

# A tibble: 4 x 3
     AA    BB    CC
  <dbl> <dbl> <dbl>
1     2     1     3
2     2     1     3
3     2     1     3
4     2     1     3

Upvotes: 1

Related Questions