User15Vim
User15Vim

Reputation: 75

Tidy data, issue with gather()

I have the below data frame. I was wondering, what is the best way to make this tidy so I can plot using ggplot2. I've tried using the gather() function in R, to do something like this:

tidy_cost <- gather(cost, key = 'Utility', value = 'unit', -c(Date, `Gas price`, `Electricity price`))

then:

tidy_cost <- gather(tidy_cost, key = 'Price', value = 'GBP', -c(Date, Utility, unit))

but what the above does is dublicate everything as for some cells there is a missmatch between 'Utility' and 'Price'....see table 2. Any advice would be greately appreciated. Thanks.

Table snippet

Table2

code to reproduce sample of original data frame :

structure(list(Date = structure(c(18686, 18714, 18745, 18775, 
18806), class = "Date"), `Gas usage` = c(0, 125, 104, 98, 43), 
    `Gas price` = c(0, 45.46, 39.67, 38.32, 19.18), `Electricity usage` = c(0, 
    208, 232, 226, 166), `Electricity price` = c(0, 37.82, 43.09, 
    42.98, 32.2)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 0

Views: 72

Answers (1)

r2evans
r2evans

Reputation: 160817

I'm not sure how you're planning to come up with GBP, but is this what you're trying to do?

tidyr::pivot_longer(cost, -Date, names_pattern = "(.*) (.*)", names_to = c("Utility", ".value"))
# # A tibble: 10 x 4
#    Date       Utility     usage price
#    <date>     <chr>       <dbl> <dbl>
#  1 2021-02-28 Gas             0   0  
#  2 2021-02-28 Electricity     0   0  
#  3 2021-03-28 Gas           125  45.5
#  4 2021-03-28 Electricity   208  37.8
#  5 2021-04-28 Gas           104  39.7
#  6 2021-04-28 Electricity   232  43.1
#  7 2021-05-28 Gas            98  38.3
#  8 2021-05-28 Electricity   226  43.0
#  9 2021-06-28 Gas            43  19.2
# 10 2021-06-28 Electricity   166  32.2

(I don't think tidyr::gather is going to be able to do this, one reason why it is superseded by pivot_longer.)

Upvotes: 1

Related Questions