Use R dplyr to tidy data frame

Question

My dataframe df looks like this:

        Value
X.Y.Z   10
X.Y.K   20
X.Y.W   30
X.Y.Z.1 20
X.Y.K.1 5
X.Y.W.1 30
X.Y.Z.2 3
X.Y.K.2 23
X.Y.W.2 44

I'm trying to unpivot using the 3rd character of the row names to name the columns, like:

So, the row names are now the last character of the rows (after the dot). I know this is possible to do with dplyr, I've tried gather, and spread, but no luck, can anyone help?

EDIT: Here's the data above in text, I:

structure(list(..1 = c("X.Y.Z", "X.Y.K", "X.Y.W", "X.Y.Z.1", 
"X.Y.K.1", "X.Y.W.1", "X.Y.Z.2", "X.Y.K.2", "X.Y.W.2"), Value = c(10, 
20, 30, 20, 5, 30, 3, 23, 44)), class = "data.frame", row.names = c(NA, 
-9L))

M-- · Accepted Answer

Solution that seems to work for OP:

library(dplyr)
library(tibble)
library(tidyr)

df1 %>% 
    rownames_to_column %>% 
    transmute(mycols = gsub('^.*\.', '', gsub('.[[:digit:]]+', '', rowname)),
              myrows = regmatches(rowname, gregexpr('[0-9]+',rowname)),
              value = Value) %>% 
    spread(key=mycols, value=value)

  #   myrows  K  W  Z
  # 1        20 30 10
  # 2      1  5 30 20
  # 3      2 23 44  3

First Version of my Answer:

library(dplyr)
library(tidyr)

df1 %>% 
  mutate(mycols = substr(gsub('.[[:digit:]]+', '', rownames(.)), 5, 5),
         myrows = as.integer(as.factor(substr(rownames(.),7,7)))-1) %>% 
  spread(key=mycols, value=Value)

#>   myrows  K  W  Z
#> 1      0 20 30 10
#> 2      1  5 30 20
#> 3      2 23 44  3

Data:

df1 <- structure(list(Value = c(10, 20, 30, 20, 5, 30, 3, 23, 44)), 
                 row.names = c("X.Y.Z", "X.Y.K", "X.Y.W", "X.Y.Z.1", 
                               "X.Y.K.1", "X.Y.W.1", "X.Y.Z.2", "X.Y.K.2", "X.Y.W.2"), 
                 class = "data.frame")

Update I:

As I said in my comment, we need to clean the data since $..1 column causes problems for dplyr. Here's a solution using the exact data provided in the question:

df1 <- structure(list(..1 = c("X.Y.Z", "X.Y.K", "X.Y.W", "X.Y.Z.1", 
                              "X.Y.K.1", "X.Y.W.1", "X.Y.Z.2", "X.Y.K.2", "X.Y.W.2"), 
                      Value = c(10, 20, 30, 20, 5, 30, 3, 23, 44)), 
                      class = "data.frame", row.names = c(NA, -9L))

library(dplyr)
library(janitor)
library(tidyr)

clean_names(df1) %>% 
  mutate(mycols = substr(gsub('.[[:digit:]]+', '', x1), 5, 5),
         myrows = as.integer(as.factor(substr(x1,7,7)))-1) %>% 
  select(-x1) %>% 
  spread(key=mycols, value=value)

#>   myrows  K  W  Z
#> 1      0 20 30 10
#> 2      1  5 30 20
#> 3      2 23 44  3

^{Created on 2019-07-29 by the reprex package (v0.3.0)}

Update II:

Couple more approaches to see if they work for OP's dataset. (Without a reproducible example, it's hard, if not impossible, to resolve this; so, these are my final efforts).

library(dplyr)
library(tibble)
library(tidyr)

df1 %>% 
  rownames_to_column %>% 
  mutate(mycols = gsub('.[[:digit:]]+', '', rowname),
         myrows = regmatches(rowname, gregexpr('[0-9]+',rowname))) %>% 
  select(-rowname) %>% 
  spread(key=mycols, value=Value)

or

df1 %>% 
  rownames_to_column %>% 
  separate(rowname,sep = "\.", into = c("A1","B2","C3", "D4")) %>% 
  select(-A1,-B2) %>% 
  spread(key=C3, value=Value)

Use R dplyr to tidy data frame

Answers (2)

Solution that seems to work for OP:

First Version of my Answer:

Update I:

Update II:

Related Questions