user882670
user882670

Reputation:

Use R dplyr to tidy data frame

My dataframe df looks like this:

        Value
X.Y.Z   10
X.Y.K   20
X.Y.W   30
X.Y.Z.1 20
X.Y.K.1 5
X.Y.W.1 30
X.Y.Z.2 3
X.Y.K.2 23
X.Y.W.2 44

I'm trying to unpivot using the 3rd character of the row names to name the columns, like:

enter image description here

So, the row names are now the last character of the rows (after the dot). I know this is possible to do with dplyr, I've tried gather, and spread, but no luck, can anyone help?

EDIT: Here's the data above in text, I:

structure(list(..1 = c("X.Y.Z", "X.Y.K", "X.Y.W", "X.Y.Z.1", 
"X.Y.K.1", "X.Y.W.1", "X.Y.Z.2", "X.Y.K.2", "X.Y.W.2"), Value = c(10, 
20, 30, 20, 5, 30, 3, 23, 44)), class = "data.frame", row.names = c(NA, 
-9L))

Upvotes: 1

Views: 287

Answers (2)

M--
M--

Reputation: 29203

Solution that seems to work for OP:

library(dplyr)
library(tibble)
library(tidyr)

df1 %>% 
    rownames_to_column %>% 
    transmute(mycols = gsub('^.*\\.', '', gsub('.[[:digit:]]+', '', rowname)),
              myrows = regmatches(rowname, gregexpr('[0-9]+',rowname)),
              value = Value) %>% 
    spread(key=mycols, value=value)
  #   myrows  K  W  Z
  # 1        20 30 10
  # 2      1  5 30 20
  # 3      2 23 44  3


First Version of my Answer:

library(dplyr)
library(tidyr)

df1 %>% 
  mutate(mycols = substr(gsub('.[[:digit:]]+', '', rownames(.)), 5, 5),
         myrows = as.integer(as.factor(substr(rownames(.),7,7)))-1) %>% 
  spread(key=mycols, value=Value)

#>   myrows  K  W  Z
#> 1      0 20 30 10
#> 2      1  5 30 20
#> 3      2 23 44  3

Data:

df1 <- structure(list(Value = c(10, 20, 30, 20, 5, 30, 3, 23, 44)), 
                 row.names = c("X.Y.Z", "X.Y.K", "X.Y.W", "X.Y.Z.1", 
                               "X.Y.K.1", "X.Y.W.1", "X.Y.Z.2", "X.Y.K.2", "X.Y.W.2"), 
                 class = "data.frame")


Update I:

As I said in my comment, we need to clean the data since $..1 column causes problems for dplyr. Here's a solution using the exact data provided in the question:

df1 <- structure(list(..1 = c("X.Y.Z", "X.Y.K", "X.Y.W", "X.Y.Z.1", 
                              "X.Y.K.1", "X.Y.W.1", "X.Y.Z.2", "X.Y.K.2", "X.Y.W.2"), 
                      Value = c(10, 20, 30, 20, 5, 30, 3, 23, 44)), 
                      class = "data.frame", row.names = c(NA, -9L))

library(dplyr)
library(janitor)
library(tidyr)

clean_names(df1) %>% 
  mutate(mycols = substr(gsub('.[[:digit:]]+', '', x1), 5, 5),
         myrows = as.integer(as.factor(substr(x1,7,7)))-1) %>% 
  select(-x1) %>% 
  spread(key=mycols, value=value)

#>   myrows  K  W  Z
#> 1      0 20 30 10
#> 2      1  5 30 20
#> 3      2 23 44  3

Created on 2019-07-29 by the reprex package (v0.3.0)



Update II:

Couple more approaches to see if they work for OP's dataset. (Without a reproducible example, it's hard, if not impossible, to resolve this; so, these are my final efforts).

library(dplyr)
library(tibble)
library(tidyr)
df1 %>% 
  rownames_to_column %>% 
  mutate(mycols = gsub('.[[:digit:]]+', '', rowname),
         myrows = regmatches(rowname, gregexpr('[0-9]+',rowname))) %>% 
  select(-rowname) %>% 
  spread(key=mycols, value=Value)

or

df1 %>% 
  rownames_to_column %>% 
  separate(rowname,sep = "\\.", into = c("A1","B2","C3", "D4")) %>% 
  select(-A1,-B2) %>% 
  spread(key=C3, value=Value)

Upvotes: 4

r2evans
r2evans

Reputation: 160792

Try this:

library(dplyr) # and tibble is needed, too
library(tidyr)
df %>%
  tibble::rownames_to_column() %>%
  mutate(
    type = gsub("(^X\\.Y\\.|\\.[0-9]$)", "", rowname),
    num = gsub("\\D", "", rowname)
  ) %>%
  select(-rowname) %>%
  spread(type, Value)
#   num  K  W  Z
# 1     20 30 10
# 2   1  5 30 20
# 3   2 23 44  3

BTW, from your question it looks like what you have are proper rownames, but your structure includes them as a column named ..1. I thought this might have been an artifact of trying to get the data into your question, so I removed it.

df <- structure(list(..1 = c("X.Y.Z", "X.Y.K", "X.Y.W", "X.Y.Z.1", 
"X.Y.K.1", "X.Y.W.1", "X.Y.Z.2", "X.Y.K.2", "X.Y.W.2"), Value = c(10, 
20, 30, 20, 5, 30, 3, 23, 44)), class = "data.frame", row.names = c(NA, -9L))
rownames(df) <- x$..1
df$..1 <- NULL

If this is not quite the case, then you can remove the call to tibble::rownames_to_column(). However, having ..1 in a dplyr chain isn't good (Error: Column 1 must not have names of the form ... or ..j.), so you'll need to rename it anyway.

Upvotes: 1

Related Questions