Using dplyr function group_by() with cut()

Question

I have a data set of real estate data. I'm trying to create a new column of days on market groups (labeled DOM_Groups) and group them into 15-day intervals (i.e. 0-14, 15-29, etc.). Then I'm trying to summarize() these groupings by the count of observations and the average sale price for each 15-day group.

I'm using the cut() function attempting to break my DOM_Groups into these 15-day intervals. In the base spreadsheet that I imported, the column containing the days on market has a unique observation in each cell, and the data in that column are numeric whole numbers...no decimals, no negative numbers.

When I run the following code, the tibble output is not grouping correctly, and it is including a negative number with a decimal, which does not exist in my data set. I'm not sure what to do to correct this.

gibbsMkt %>% 
  mutate(DOM_Groups = cut(DOM, breaks = 15, dig.lab = 2)) %>% 
  filter(Status == "SOLD") %>% 
  group_by(DOM_Groups) %>% 
  summarize(numDOM = n(),
            avgSP = mean(`Sold Price`, na.rm = TRUE))

The tibble output I get is this:


DOM_Groups        numDOM   avgSP
                   
1 (-0.23,16]            74 561675.
2 (16,31]               18 632241.
3 (31,47]               11 561727.
4 (47,63]                8 545862.
5 (63,78]                7 729286.
6 (78,94]                6 624167.
7 (1.4e+02,1.6e+02]      2 541000 
8 (1.6e+02,1.7e+02]      1 535395

Also, for rows 7 & 8 in the tibble, the largest number is 164, so I also don't understand why these rows are being converted to scientific notation.

When I use an Excel pivot table, I get the output that I want to reproduce in R, which is depicted below:

How can I reproduce this in R with the correct code?

dash2 · Accepted Answer

Here's a simple solution with my santoku package:

library(santoku)
gibbsMkt %>% 
  mutate(DOM_Groups = chop_width(DOM, 15, labels = lbl_dash("-")))

# then proceed as before

You can use the start argument to chop_width if you want to start the intervals at a particular number.

Using dplyr function group_by() with cut()

Answers (2)

Related Questions