ebb
ebb

Reputation: 274

R generate bins from a data frame respecting blanks

I need to generate bins from a data.frame based on the values of one column. I have tried the function "cut".

For example: I want to create bins of air temperature values in the column "AirTDay" in a data frame:

AirTDay (oC)    
8.16           
10.88          
5.28

19.82
23.62
13.14


28.84
32.21
17.44
31.21

I need the bin intervals to include all values in a range of 2 degrees centigrade from that initial value (i.e. 8-9.99, 10-11.99, 12-13.99...), to be labelled with the average value of the range (i.e. 9.5, 10.5, 12.5...), and to respect blank cells, returning "NA" in the bins column.

The output should look as:

Air_T (oC)    TBins 
8.16          8.5 
10.88         10.5 
5.28          NA 
              NA
19.82         20.5
23.62         24.5
13.14         14.5
              NA
              NA
28.84         28.5
32.21         32.5
17.44         18.5
31.21         32.5

I've gotten as far as:

setwd('C:/Users/xxx') 

temp_data <- read.csv("temperature.csv", sep = ",", header = TRUE)
TAir <- temp_data$AirTDay

Tmin <- round(min(TAir, na.rm = FALSE), digits = 0) # is start at minimum value
Tmax <- round(max(TAir, na.rm = FALSE), digits = 0)  
int <- 2 # bin ranges 2 degrees
mean_int <- int/2

int_range <- seq(Tmin, Tmax + int, int) # generate bin sequence 
bin_label <- seq(Tmin + mean_int, Tmax + mean_int, int) # generate labels

temp_data$TBins <- cut(TAir, breaks = int_range, ordered_result = FALSE, labels = bin_label)

The output table looks correct, but for some reason it shows a sequential additional column, shifts column names, and collapse all values eliminating blank cells. Something like this:

Air_T (oC)    TBins 
    1         8.16          8.5 
    2         10.88         10.5 
    3         5.28          NA 
    4         19.82         20.5
    5         23.62         24.5
    6         13.14         14.5
    7         28.84         28.5
    8         32.21         32.5
    9         17.44         18.5
   10         31.21         32.5

Any ideas on where am I failing and how to solve it?

Upvotes: 0

Views: 268

Answers (1)

Onyambu
Onyambu

Reputation: 79328

v<-ceiling(max(dat$V1,na.rm=T))
breaks<-seq(8,v,2)
labels=seq(8.5,length.out=length(s)-1,by=2)
transform(dat,Tbins=cut(V1,breaks,labels))
      V1 Tbins
1   8.16   8.5
2  10.88  10.5
3   5.28  <NA>
4     NA  <NA>
5  19.82  18.5
6  23.62  22.5
7  13.14  12.5
8     NA  <NA>
9     NA  <NA>
10 28.84  28.5
11 32.21  <NA>
12 17.44  16.5
13 31.21  30.5

This result follows the logic given: we have

paste(seq(8,v,2),seq(9.99,v,by=2),sep="-")
 [1] "8-9.99"   "10-11.99" "12-13.99" "14-15.99" "16-17.99" "18-19.99" "20-21.99"
 [8] "22-23.99" "24-25.99" "26-27.99" "28-29.99" "30-31.99"

From this we can tell that 19.82 will lie between 18 and 20 thus given the value 18.5, similar to 10.88 being between 10-11.99 thus assigned the value 10.5

Upvotes: 1

Related Questions