Reputation: 59

Paste colnames by sequence

Hi and happy new year at all.

I have a tricky task (in my opinion) and I can not find a way to solve it. Please see following toy data. The orginal dataset has hundreds of cols/rows.

test<-data.frame(name=c("Amber","Thomas","Stefan","Zlatan"),
             US=c(8,2,NA,7),
             UK=c(5,4,1,7))

I want to create a new column, called "origin", which pastes the colname of each cell (without NA) seperated by "|" under consideration of the corresponding value. Higher values should be pasted first. As for same values (like Zlatan), the sequence isn´t relevant. Output for Zlatan could be US|UK OR UK|US.

This is the desired ouput:

I tried some hours to solve it but no approach worked. May be it make sense to convert the values as.factor...

Help is much appreciated. Thank you in advance!

Upvotes: 0

Answers (3)

Ian Campbell

Reputation: 24838

Here's a dplyr approach. First, we can use rowwise to work on individual rows independently. Next, we can use c_across which allows us to select values from that row only. We can subset a vector of c("US","UK") based on whether the US and UK columns are not NA.

paste with collapse = "|" allows us to put the values together with the seperator. I added a row to see what would happen if they are both NA.

library(dplyr)
test %>%
   rowwise() %>%
   mutate(origin = paste(c("US","UK")[rev(order(c_across(US:UK), na.last = NA))], collapse = "|"))
# A tibble: 5 x 4
# Rowwise: 
  name      US    UK origin 
  <chr>  <dbl> <dbl> <chr>  
1 Amber      8     5 "US|UK"
2 Thomas     2     4 "UK|US"
3 Stefan    NA     1 "UK"   
4 Zlatan     7     7 "UK|US"
5 Bob       NA    NA ""

This is also trivially expanded to more columns:

test<-data.frame(name=c("Amber","Thomas","Stefan","Zlatan","Bob"),
                 US=c(8,2,NA,7,NA),
                 UK=c(5,4,1,7,NA),
                 AUS=c(1,2,NA,NA,1))

test %>%
   rowwise() %>%
   mutate(origin = paste(c("US","UK","AUS")[rev(order(c_across(US:AUS), na.last = NA))], collapse = "|"))
# A tibble: 5 x 5
# Rowwise: 
  name      US    UK   AUS origin   
  <chr>  <dbl> <dbl> <dbl> <chr>    
1 Amber      8     5     1 US|UK|AUS
2 Thomas     2     4     2 UK|AUS|US
3 Stefan    NA     1    NA UK       
4 Zlatan     7     7    NA UK|US    
5 Bob       NA    NA     1 AUS

Or with tidyselect assistance to perform all columns but name:

test %>%
  rowwise() %>%
  mutate(origin = paste(names(across(-name))[rev(order(c_across(-name), na.last = NA))], collapse = "|"))

Upvotes: 4

kikoralston

Reputation: 1236

Another possibility with tidyverse. It is longer than the other two solutions, but it should work directly with a dataframe with as many columns as you need.

I changed the dataframe to long format, filtered out NAs, grouped by name, summarized using paste, and joined with the original dataframe to get the original columns (and rows with all NAs).

library(tidyverse)

test<-data.frame(name=c("Amber","Thomas","Stefan","Zlatan","Bob"),
                 US=c(8,2,NA,7,NA),
                 UK=c(5,4,1,7,NA),
                 AUS=c(1,2,NA,NA,1))
test %>%
  # change to long format
  tidyr::pivot_longer(cols=-name, names_to = "country", values_to = "value") %>%
  # remove rows with NA
  dplyr::filter(!is.na(value)) %>%
  # group by name and sort
  dplyr::group_by(name) %>% dplyr::arrange(-value) %>%
  # create summary of countries for each name in column 'origin'
  dplyr::summarise(origin=paste(country, collapse = "|")) %>%
  # join with original data frame to include original columns (and names with only NA) and change NA to '' in origin
  dplyr::right_join(test, by='name') %>% dplyr::mutate(origin=ifelse(is.na(origin), '', origin)) %>%
  # move origin column to end
  dplyr::relocate(origin, .after = last_col())

Result

name      US    UK   AUS origin   
  <chr>  <dbl> <dbl> <dbl> <chr>    
1 Amber      8     5     1 US|UK|AUS
2 Bob       NA    NA     1 AUS      
3 Stefan    NA     1    NA UK       
4 Thomas     2     4     2 UK|US|AUS
5 Zlatan     7     7    NA US|UK

Upvotes: 2

Mario Niepel

Reputation: 1165

Here's a different tidyverse solution using case_when:

library(tidyverse)
data <- data.frame (test<-data.frame(
    "name" =c("Amber","Thomas","Stefan","Zlatan"),
    "US" =c(8,2,NA,7),
    "UK" =c(5,4,1,7)))

data <- data %>% mutate(origin = case_when( US >  UK ~ "US|UK", 
                                    UK >= US ~ "UK|US",
                                    is.na(UK) & !is.na(US) ~ "US", 
                                    is.na(US) & !is.na(UK) ~ "UK"))
data
#>     name US UK origin
#> 1  Amber  8  5  US|UK
#> 2 Thomas  2  4  UK|US
#> 3 Stefan NA  1     UK
#> 4 Zlatan  7  7  UK|US

^{Created on 2021-01-06 by the reprex package (v0.3.0)}

Upvotes: 1

Paste colnames by sequence

Answers (3)

Related Questions