fullera
fullera

Reputation: 204

How to pull string from character in one dataframe and place into a new table

I am working on a new shiny project and trying to re-use some of my collegues work that he has done in SQL to speed up the time it takes to build the data for this app.

I don't exactly know how to describe this problem so I will do so by showing what I have and explaining what I want to get.

Essentially we have an SQL script which spits out a bunch of data into two columns.

  1. Is an identifier column which in the past we use vlookup to split the string components and fill in cells in excel.
  2. Is the value of said identifier whether it be counts, averages or percentages.

It would look like the following.

                                            lookup output
    1:                                 dataAU20161    142
    2:                                 dataAU20171    246
    3:                                 dataAU20181     17
    4:                                 dataAU20191      3
    5:                                 dataAU20162    193
    6:                                 dataAU20172    203
    7:                                 dataAU20182     11
    8:                                 dataAU20192      9

So ideally I would like to transform this data into the following format where the 'data' string identifies that they will go into the same data frame. The years in the string will be implemented into columns and the number following the years (1 or 2) will be implemented as a column as a factor variable.

x   2016    2017    2018    2019
--------------------------------
1   142     246     17      3
2   193     203     11      9

Any help with this would be greatly appreciated!

Upvotes: 1

Views: 131

Answers (1)

akrun
akrun

Reputation: 887193

An option would to be separate thecolumn 'lookup'into two

library(dplyr)
library(tidyr)
df1 %>%
    extract(lookup, into = c('lookup', 'rn'), 'dataAU(\\d{4})(\\d{1})') %>%
    pivot_wider(names_from= lookup, values_from =output) %>%
    dplyr::select(-rn)
# A tibble: 2 x 4
#  `2016` `2017` `2018` `2019`
#   <int>  <int>  <int>  <int>
#1    142    246     17      3
#2    193    203     11      9

data

df1 <- structure(list(lookup = c("dataAU20161", "dataAU20171", "dataAU20181", 
"dataAU20191", "dataAU20162", "dataAU20172", "dataAU20182", "dataAU20192"
), output = c(142L, 246L, 17L, 3L, 193L, 203L, 11L, 9L)), class = "data.frame", 
row.names = c("1:", 
"2:", "3:", "4:", "5:", "6:", "7:", "8:"))

Upvotes: 2

Related Questions