Jack Olive
Jack Olive

Reputation: 33

Moving elements from column to column in r

I have a dataframe that looks like this (but for every US county)

county state n_state_1 n_state_2 n_state_3 n_state_4
Autauga County AL NA FL NA NA
Baldwin County AL GA NA TN NA
Catron County AL FL GA NA CA

I want to move the non-missing values (FL,GA,TN etc.) to the first columns starting from n_state_1 and then delete the columns containing only missing values to get:

county state n_state_1 n_state_2 n_state_3
Autauga County AL FL NA NA
Baldwin County AL GA TN NA
Catron County AL FL GA CA

I am struggling with the first step. I thought about using the function distinct but it doesn't work because there are non-empty elements in each column.

Upvotes: 3

Views: 108

Answers (2)

akrun
akrun

Reputation: 886938

Or another option with dapply from collapse and select only columns with any non-NA elements

library(collapse)
library(dplyr)
dapply(df1, MARGIN = 1, FUN = function(x) c(x[!is.na(x)], x[is.na(x)])) %>%
   select(where(~ any(complete.cases(.))))
# A tibble: 3 x 5
  county         state n_state_1 n_state_2 n_state_3
  <chr>          <chr> <chr>     <chr>     <chr>    
1 Autauga_County AL    FL        <NA>      <NA>     
2 Baldwin_County AL    GA        TN        <NA>     
3 Catron_County  AL    FL        GA        CA       

Upvotes: 2

Martin Gal
Martin Gal

Reputation: 16978

You could use dplyr and tidyr:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(starts_with("n_state")) %>% 
  drop_na() %>%
  group_by(county, state) %>% 
  mutate(name=row_number()) %>% 
  pivot_wider(names_prefix="n_state_")

which returns

  county         state n_state_1 n_state_2 n_state_3
  <chr>          <chr> <chr>     <chr>     <chr>    
1 Autauga_County AL    FL        NA        NA       
2 Baldwin_County AL    GA        TN        NA       
3 Catron_County  AL    FL        GA        CA  

What happened here?

  • pivot_longer takes the n_state_{n}-columns and collapses them into two columns: the name-column contains the original column name (n_state_1, n_state_2 etc), the value-column contains the states (FL, GA or <NA> in many cases).
  • Next we remove every <NA> entry. (Note: I use <NA> to make clear it's an NA-value).)
  • After a grouping by county and state we add a rownumber. These numbers will be later used to create the new column names.
  • pivot_wider now takes these row numbers and prefixes them with n_state_ to get the new columns. The values are taken from the value-column created in the second line of code. pivot_wider fills the missing values with <NA>-values (default behaviour).

Data

structure(list(county = c("Autauga_County", "Baldwin_County", 
"Catron_County"), state = c("AL", "AL", "AL"), n_state_1 = c(NA, 
"GA", "FL"), n_state_2 = c("FL", NA, "GA"), n_state_3 = c(NA, 
"TN", NA), n_state_4 = c(NA, NA, "CA")), problems = structure(list(
    row = 3L, col = "n_state_4", expected = "", actual = "embedded null", 
    file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -3L), spec = structure(list(
    cols = list(county = structure(list(), class = c("collector_character", 
    "collector")), state = structure(list(), class = c("collector_character", 
    "collector")), n_state_1 = structure(list(), class = c("collector_character", 
    "collector")), n_state_2 = structure(list(), class = c("collector_character", 
    "collector")), n_state_3 = structure(list(), class = c("collector_character", 
    "collector")), n_state_4 = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

Upvotes: 3

Related Questions