Ariya Woodard
Ariya Woodard

Reputation: 59

How to convert long form to wide form based on category in R

I have the following data.

   name     x1    x2    x3    x4
 1 V1_3  1     0     999     999
 2 V2_3  1.12  0.044  25.4     0
 3 V3_3  0.917 0.045  20.4     0
 4 V1_15 1     0     999     999
 5 V2_15 1.07  0.036  29.8     0
 6 V3_15 0.867 0.039  22.5     0
 7 V1_25 1     0     999     999
 8 V2_25 1.07  0.034  31.1     0
 9 V3_25 0.917 0.037  24.6     0
10 V1_35 1     0     999     999
11 V2_35 1.05  0.034  31.2     0
12 V3_35 0.994 0.037  26.6     0
13 V1_47 1     0     999     999
14 V2_47 1.03  0.031  33.6     0
15 V3_47 0.937 0.034  27.4     0
16 V1_57 1     0     999     999
17 V2_57 1.13  0.036  31.9     0
18 V3_57 1.03  0.037  28.1     0

I want to convert this data to the following data. Can someone give me some suggestion, please?

  name  est_3 est_15 est_25 est_35 est_47 est_57
1 V2    1.12   1.07   1.07   1.05   1.03    1.13
2 V3    0.917  0.867  0.917  0.994  0.937   1.03

Upvotes: 0

Views: 87

Answers (1)

jazzurro
jazzurro

Reputation: 23574

Here is one approach for you. Your data is called mydf here. First, you want to choose necessary columns (i.e., name and x1) using select(). Then, you want to subset rows using filter(). You want to grab rows that begin with V2 or V3 in strings. grepl() checks if each string has the pattern. Then, you want to split the column, name and create two columns (i.e., name and est). Finally, you want to convert the data to a long-format data using pivot_wider().

library(dplyr)
library(tidyr)

select(mydf, name:x1) %>%            
filter(grepl(x = name, pattern = "^V[2|3]")) %>% 
separate(col = name, into = c("name", "est"), sep = "_") %>%
pivot_wider(names_from = "est",values_from = "x1", names_prefix = "est_")

#  name  est_3 est_15 est_25 est_35 est_47 est_57
#  <chr> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#1 V2    1.12   1.07   1.07   1.05   1.03    1.13
#2 V3    0.917  0.867  0.917  0.994  0.937   1.03

For your reference, when you ask questions, you want to provide a minimal sample data and code. If you can do that, SO users can easily help you out. Please read this question.

DATA

mydf <- structure(list(name = c("V1_3", "V2_3", "V3_3", "V1_15", "V2_15", 
"V3_15", "V1_25", "V2_25", "V3_25", "V1_35", "V2_35", "V3_35", 
"V1_47", "V2_47", "V3_47", "V1_57", "V2_57", "V3_57"), x1 = c(1, 
1.122, 0.917, 1, 1.069, 0.867, 1, 1.066, 0.917, 1, 1.048, 0.994, 
1, 1.03, 0.937, 1, 1.133, 1.032), x2 = c(0, 0.044, 0.045, 0, 
0.036, 0.039, 0, 0.034, 0.037, 0, 0.034, 0.037, 0, 0.031, 0.034, 
0, 0.036, 0.037), x3 = c(999, 25.446, 20.385, 999, 29.751, 22.478, 
999, 31.134, 24.565, 999, 31.18, 26.587, 999, 33.637, 27.405, 
999, 31.883, 28.081), x4 = c(999, 0, 0, 999, 0, 0, 999, 0, 0, 
999, 0, 0, 999, 0, 0, 999, 0, 0)), row.names = c(NA, -18L), class = c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 1

Related Questions