Extract part of string values, make new column names, and make dataframe wide

Question

I want to extract the last part of a string column (always wrapped in square brackets), make these the names of new columns, and then reshape the data from long to wide and fill the new columns with these values.

For example, if I have this dataframe:

whatihave <- data_frame(v1 = c('abc [effort]', 'def [effort]', 'ghi [effort]', 'abc [scope]', 'def [scope]', 'ghi [scope]'),
                        scores = c(1:6))

# A tibble: 6 x 2
  v1           scores
           
1 abc [effort]      1
2 def [effort]      2
3 ghi [effort]      3
4 abc [scope]       4
5 def [scope]       5
6 ghi [scope]       6

I want to transform it into this dataframe:

whatiwant <- data_frame(v1 = c('abc', 'def', 'ghi'), 
                        effort = c(1, 2, 3),
                        scope = c(4, 5, 6))

  v1    effort scope
     
1 abc        1     4
2 def        2     5
3 ghi        3     6

As you can see, the characters inside the square brackets at the end of the values in the v1 column have become the names of two new variables (effort and scope). The values from the scores column have then filled the new columns i've made.

How do I do this?

EDIT BELOW

The toy data I mocked up was missing a key feature of my actual data. There are actually multiple instances of the same v1 value and with the same suffix in square brackets.

As a result the answer below (which is otherwise excellent) results in lists of cols in each cell of effort and scope instead of individual values.

Let's imagine I have this data instead:

whatiactuallyhave <- data_frame(v1 = c('abc [effort]', 'abc [effort]', 'def [effort]', 'def [effort]', 'ghi [effort]', 'abc [scope]', 'abc [scope]', 'def [scope]', 'ghi [scope]', 'ghi [scope]'), 
                        scores = c('1', '2', '3', '4', '5', '6', '7', '8', '9', '10'))

# A tibble: 10 x 2
   v1           scores
            
 1 abc [effort] 1     
 2 abc [effort] 2     
 3 def [effort] 3     
 4 def [effort] 4     
 5 ghi [effort] 5     
 6 abc [scope]  6     
 7 abc [scope]  7     
 8 def [scope]  8     
 9 ghi [scope]  9     
10 ghi [scope]  10

And that I want to transform it to this:

whatiactuallywant <- data_frame(v1 = c('abc', 'abc', 'def', 'def', 'ghi', 'ghi'), 
                        effort = c('1', '2', '3', '4', '5', 'NA'),
                        scope = c('6', '7', '8', 'NA', '9', '10'))

# A tibble: 6 x 3
  v1    effort scope
     
1 abc   1      6    
2 abc   2      7    
3 def   3      8    
4 def   4      NA   
5 ghi   5      9    
6 ghi   NA     10

I hope that's clearer now! Thanks so much for your help.

AnilGoyal · Accepted Answer

Revised scenario

Using tidyr::extract will lead you saving one extra step of mutate as you can directly extract two desired strings into two columns using regex here.

library(tidyverse)
whatiactuallyhave <- data_frame(v1 = c('abc [effort]', 'abc [effort]', 'def [effort]', 'def [effort]', 'ghi [effort]', 'abc [scope]', 'abc [scope]', 'def [scope]', 'ghi [scope]', 'ghi [scope]'), 
                                scores = c('1', '2', '3', '4', '5', '6', '7', '8', '9', '10'))
#> Warning: `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.

whatiactuallyhave %>%
  tidyr::extract(v1, into = c('v1', 'name'), regex = '(\w+)\s$$(\w+)$$') %>%
  group_by(v1, name) %>%
  mutate(d = row_number()) %>%
  pivot_wider(names_from = name, values_from = scores, values_fill = NA) %>%
  select(-d)

#> # A tibble: 6 x 3
#> # Groups:   v1 [3]
#>   v1    effort scope
#>      
#> 1 abc   1      6    
#> 2 abc   2      7    
#> 3 def   3      8    
#> 4 def   4       
#> 5 ghi   5      9    
#> 6 ghi      10

^{Created on 2021-05-26 by the reprex package (v2.0.0)}

Earlier answer

whatihave <- data_frame(v1 = c('abc [effort]', 'def [effort]', 'ghi [effort]', 'abc [scope]', 'def [scope]', 'ghi [scope]'),
                        scores = c(1:6))

library(tidyverse)
whatihave %>%
  separate(v1, into = c('v1', 'name'), sep = ' $$') %>%
  mutate(name = str_remove(name, '$$')) %>%
  pivot_wider(names_from = name, values_from = scores)


# A tibble: 3 x 3
  v1    effort scope
     
1 abc        1     4
2 def        2     5
3 ghi        3     6

Extract part of string values, make new column names, and make dataframe wide

Answers (2)

Revised scenario

Earlier answer

Related Questions