Reputation: 17
I have data that is formatted this way when I pull it for the web data base.
site | date | time | parameter_code | remark_code | result |
---|---|---|---|---|---|
001 | 01-01-2020 | 15:10 | 00010 | NA | 16.6 |
001 | 01-01-2020 | 15:10 | 00095 | NA | 521.0 |
001 | 01-01-2020 | 15:10 | 00300 | NA | 5.6 |
001 | 01-01-2020 | 15:10 | 34475 | < | 1.0 |
001 | 03-30-2020 | 09:45 | 00010 | NA | 18.0 |
001 | 03-30-2020 | 09:45 | 00095 | NA | 546.0 |
001 | 03-30-2020 | 09:45 | 00300 | NA | 3.7 |
001 | 03-30-2020 | 09:45 | 34475 | NA | 2.3 |
I want to format it as shown below. I know that I need to combine the columns of Remark_code and Result first, but I'm not sure how to parse out the Parameter_code column into individual columns with the Result filling in the "body" of the table.
site | date | time | 00010 | 00095 | 00300 | 34475 |
---|---|---|---|---|---|---|
001 | 01-01-2020 | 15:10 | 16.6 | 521.0 | 5.6 | <1.0 |
001 | 03-30-2020 | 15:10 | 18.0 | 546.0 | 3.7 | 2.3 |
I don't necessarily need all of the code on how to do this, but just direction on what functions to use. I've been struggling with this since I'm not even sure what key words to look up or what this type of transformation(?) this would be called. Any help would be appreciated.
Upvotes: 1
Views: 94
Reputation: 886938
We can use pivot_wider
after unite
ing the columns 'REMARK_CODE' and 'RESULT'
library(dplyr)
library(tidyr)
df1 %>%
unite(RESULT, REMARK_CODE, RESULT, sep="", na.rm = TRUE) %>%
pivot_wider(names_from = PARAMETER_CODE, values_from = RESULT)
-output
# A tibble: 2 x 7
# SITE DATE TIME `10` `95` `300` `34475`
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 001 1/1/2020 15:10 16.6 521 5.6 <1
# 2 001 3/30/2020 9:45 18 546 3.7 2.3
df1 <- structure(list(SITE = c("001", "001", "001", "001", "001", "001",
"001", "001"), DATE = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020",
"3/30/2020", "3/30/2020", "3/30/2020", "3/30/2020"), TIME = c("15:10",
"15:10", "15:10", "15:10", "9:45", "9:45", "9:45", "9:45"), PARAMETER_CODE = c(10L,
95L, 300L, 34475L, 10L, 95L, 300L, 34475L), REMARK_CODE = c(NA,
NA, NA, "<", NA, NA, NA, NA), RESULT = c(16.6, 521, 5.6, 1, 18,
546, 3.7, 2.3)), row.names = c(NA, -8L), class = "data.frame")
Upvotes: 0
Reputation: 39585
Another approach can be:
library(dplyr)
library(tidyr)
#Code
new <- df %>% mutate(RESULT=ifelse(is.na(REMARK_CODE),paste0('',RESULT),
paste0(REMARK_CODE,RESULT))) %>%
select(-REMARK_CODE) %>%
pivot_wider(names_from = PARAMETER_CODE,values_from=RESULT)
Output:
# A tibble: 2 x 7
SITE DATE TIME `10` `95` `300` `34475`
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 001 1/1/2020 15:10 16.6 521 5.6 <1
2 001 3/30/2020 9:45 18 546 3.7 2.3
Some data used:
#Data
df <- structure(list(SITE = c("001", "001", "001", "001", "001", "001",
"001", "001"), DATE = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020",
"3/30/2020", "3/30/2020", "3/30/2020", "3/30/2020"), TIME = c("15:10",
"15:10", "15:10", "15:10", "9:45", "9:45", "9:45", "9:45"), PARAMETER_CODE = c(10L,
95L, 300L, 34475L, 10L, 95L, 300L, 34475L), REMARK_CODE = c(NA,
NA, NA, "<", NA, NA, NA, NA), RESULT = c(16.6, 521, 5.6, 1, 18,
546, 3.7, 2.3)), row.names = c(NA, -8L), class = "data.frame")
Upvotes: 1