Reputation: 75
I have the below data frame. I was wondering, what is the best way to make this tidy so I can plot using ggplot2. I've tried using the gather() function in R, to do something like this:
tidy_cost <- gather(cost, key = 'Utility', value = 'unit', -c(Date, `Gas price`, `Electricity price`))
then:
tidy_cost <- gather(tidy_cost, key = 'Price', value = 'GBP', -c(Date, Utility, unit))
but what the above does is dublicate everything as for some cells there is a missmatch between 'Utility' and 'Price'....see table 2. Any advice would be greately appreciated. Thanks.
code to reproduce sample of original data frame :
structure(list(Date = structure(c(18686, 18714, 18745, 18775,
18806), class = "Date"), `Gas usage` = c(0, 125, 104, 98, 43),
`Gas price` = c(0, 45.46, 39.67, 38.32, 19.18), `Electricity usage` = c(0,
208, 232, 226, 166), `Electricity price` = c(0, 37.82, 43.09,
42.98, 32.2)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 0
Views: 72
Reputation: 160817
I'm not sure how you're planning to come up with GBP
, but is this what you're trying to do?
tidyr::pivot_longer(cost, -Date, names_pattern = "(.*) (.*)", names_to = c("Utility", ".value"))
# # A tibble: 10 x 4
# Date Utility usage price
# <date> <chr> <dbl> <dbl>
# 1 2021-02-28 Gas 0 0
# 2 2021-02-28 Electricity 0 0
# 3 2021-03-28 Gas 125 45.5
# 4 2021-03-28 Electricity 208 37.8
# 5 2021-04-28 Gas 104 39.7
# 6 2021-04-28 Electricity 232 43.1
# 7 2021-05-28 Gas 98 38.3
# 8 2021-05-28 Electricity 226 43.0
# 9 2021-06-28 Gas 43 19.2
# 10 2021-06-28 Electricity 166 32.2
(I don't think tidyr::gather
is going to be able to do this, one reason why it is superseded by pivot_longer
.)
Upvotes: 1