CCC
CCC

Reputation: 69

Combining column values with column names for some select columns using tidyr unite

Given a dataframe:

df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))

I want to combine column with their column names. I know I can use unite from tidyr and get the following output.

df %>% unite(NewCol, c(Col1, Col4), remove = F)

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  A_100
2    B    X    2  101  B_101
3    C    Y    3  102  C_102
4    D    Z    4  103  D_103

But I want to have the column name next to the value of the column as follows (the separator _ is really not that important):

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  Col1_A_Col4_100
2    B    X    2  101  Col1_B_Col4_101
3    C    Y    3  102  Col1_C_Col4_102
4    D    Z    4  103  Col1_D_Col4_103

I tried the solution posted here which does give the desired output but it creates a separate output.

imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
  unite(NewCol, sep = "_")

  NewCol         
  <chr>          
1 Col1_A_Col4_100
2 Col1_B_Col4_101
3 Col1_C_Col4_102
4 Col1_D_Col4_103

Would I simply use bind_cols() to combine both? How do I know the sequence of the rows is preserved between the two? Is there another way that I can create NewCol within the same dataframe similar to unite in the first case?

Upvotes: 2

Views: 516

Answers (1)

jared_mamrot
jared_mamrot

Reputation: 26695

One option is to create temporary 'colname + value' columns, then unite them in a second step, e.g.

## Load libraries
library(tidyverse)

## Load example data
df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))

## Expected outcome
df %>% bind_cols(imap_dfr(df %>% select(Col1, Col4),
                          ~ paste(.y, .x, sep = "_")) %>%
                   unite(newcol, sep = "_"))
#>   Col1 Col2 Col3 Col4          newcol
#> 1    A    W    1  100 Col1_A_Col4_100
#> 2    B    X    2  101 Col1_B_Col4_101
#> 3    C    Y    3  102 Col1_C_Col4_102
#> 4    D    Z    4  103 Col1_D_Col4_103

## With a small number of columns
df %>%
  mutate(tmp_Col1 = paste0("Col1", "_", Col1),
         tmp_Col4 = paste0("Col4", "_", Col4)) %>%
  unite(newcol, c(tmp_Col1, tmp_Col4), sep = "_")
#>   Col1 Col2 Col3 Col4          newcol
#> 1    A    W    1  100 Col1_A_Col4_100
#> 2    B    X    2  101 Col1_B_Col4_101
#> 3    C    Y    3  102 Col1_C_Col4_102
#> 4    D    Z    4  103 Col1_D_Col4_103

## With a large number of columns
df %>%
  mutate(across(c(Col1, Col4),
                ~paste0(cur_column(), "_", .x))) %>%
  unite(newcol, c(Col1, Col4), sep = "_") %>%
  left_join(df)
#> Joining with `by = join_by(Col2, Col3)`
#>            newcol Col2 Col3 Col1 Col4
#> 1 Col1_A_Col4_100    W    1    A  100
#> 2 Col1_B_Col4_101    X    2    B  101
#> 3 Col1_C_Col4_102    Y    3    C  102
#> 4 Col1_D_Col4_103    Z    4    D  103

Created on 2023-06-22 with reprex v2.0.2

If you have a large number of columns you want to transform, using across() allows you to employ tidyselect functions, such as starts_with(), to select columns of interest without having to specify each column by name.

Upvotes: 2

Related Questions