Cbuckley
Cbuckley

Reputation: 17

Reorganizing Data in a data table

I have data that is formatted this way when I pull it for the web data base.

site date time parameter_code remark_code result
001 01-01-2020 15:10 00010 NA 16.6
001 01-01-2020 15:10 00095 NA 521.0
001 01-01-2020 15:10 00300 NA 5.6
001 01-01-2020 15:10 34475 < 1.0
001 03-30-2020 09:45 00010 NA 18.0
001 03-30-2020 09:45 00095 NA 546.0
001 03-30-2020 09:45 00300 NA 3.7
001 03-30-2020 09:45 34475 NA 2.3

I want to format it as shown below. I know that I need to combine the columns of Remark_code and Result first, but I'm not sure how to parse out the Parameter_code column into individual columns with the Result filling in the "body" of the table.

site date time 00010 00095 00300 34475
001 01-01-2020 15:10 16.6 521.0 5.6 <1.0
001 03-30-2020 15:10 18.0 546.0 3.7 2.3

I don't necessarily need all of the code on how to do this, but just direction on what functions to use. I've been struggling with this since I'm not even sure what key words to look up or what this type of transformation(?) this would be called. Any help would be appreciated.

Upvotes: 1

Views: 94

Answers (2)

akrun
akrun

Reputation: 886938

We can use pivot_wider after uniteing the columns 'REMARK_CODE' and 'RESULT'

library(dplyr)
library(tidyr)
df1 %>%
  unite(RESULT, REMARK_CODE, RESULT, sep="", na.rm = TRUE) %>%
   pivot_wider(names_from = PARAMETER_CODE, values_from = RESULT)

-output

# A tibble: 2 x 7
#   SITE  DATE      TIME  `10`  `95`  `300` `34475`   
#   <chr> <chr>     <chr> <chr> <chr> <chr> <chr>  
# 1 001   1/1/2020  15:10 16.6  521   5.6   <1     
#   2 001   3/30/2020 9:45  18    546   3.7   2.3    

data

df1 <- structure(list(SITE = c("001", "001", "001", "001", "001", "001", 
"001", "001"), DATE = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020", 
"3/30/2020", "3/30/2020", "3/30/2020", "3/30/2020"), TIME = c("15:10", 
"15:10", "15:10", "15:10", "9:45", "9:45", "9:45", "9:45"), PARAMETER_CODE = c(10L, 
95L, 300L, 34475L, 10L, 95L, 300L, 34475L), REMARK_CODE = c(NA, 
NA, NA, "<", NA, NA, NA, NA), RESULT = c(16.6, 521, 5.6, 1, 18, 
546, 3.7, 2.3)), row.names = c(NA, -8L), class = "data.frame")

Upvotes: 0

Duck
Duck

Reputation: 39585

Another approach can be:

library(dplyr)
library(tidyr)
#Code
new <- df %>% mutate(RESULT=ifelse(is.na(REMARK_CODE),paste0('',RESULT),
                            paste0(REMARK_CODE,RESULT))) %>%
  select(-REMARK_CODE) %>%
  pivot_wider(names_from = PARAMETER_CODE,values_from=RESULT)

Output:

# A tibble: 2 x 7
  SITE  DATE      TIME  `10`  `95`  `300` `34475`
  <chr> <chr>     <chr> <chr> <chr> <chr> <chr>  
1 001   1/1/2020  15:10 16.6  521   5.6   <1     
2 001   3/30/2020 9:45  18    546   3.7   2.3    

Some data used:

#Data
df <- structure(list(SITE = c("001", "001", "001", "001", "001", "001", 
"001", "001"), DATE = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020", 
"3/30/2020", "3/30/2020", "3/30/2020", "3/30/2020"), TIME = c("15:10", 
"15:10", "15:10", "15:10", "9:45", "9:45", "9:45", "9:45"), PARAMETER_CODE = c(10L, 
95L, 300L, 34475L, 10L, 95L, 300L, 34475L), REMARK_CODE = c(NA, 
NA, NA, "<", NA, NA, NA, NA), RESULT = c(16.6, 521, 5.6, 1, 18, 
546, 3.7, 2.3)), row.names = c(NA, -8L), class = "data.frame")

Upvotes: 1

Related Questions