Reputation: 274
I need to generate bins from a data.frame based on the values of one column. I have tried the function "cut".
For example: I want to create bins of air temperature values in the column "AirTDay" in a data frame:
AirTDay (oC)
8.16
10.88
5.28
19.82
23.62
13.14
28.84
32.21
17.44
31.21
I need the bin intervals to include all values in a range of 2 degrees centigrade from that initial value (i.e. 8-9.99, 10-11.99, 12-13.99...), to be labelled with the average value of the range (i.e. 9.5, 10.5, 12.5...), and to respect blank cells, returning "NA" in the bins column.
The output should look as:
Air_T (oC) TBins
8.16 8.5
10.88 10.5
5.28 NA
NA
19.82 20.5
23.62 24.5
13.14 14.5
NA
NA
28.84 28.5
32.21 32.5
17.44 18.5
31.21 32.5
I've gotten as far as:
setwd('C:/Users/xxx')
temp_data <- read.csv("temperature.csv", sep = ",", header = TRUE)
TAir <- temp_data$AirTDay
Tmin <- round(min(TAir, na.rm = FALSE), digits = 0) # is start at minimum value
Tmax <- round(max(TAir, na.rm = FALSE), digits = 0)
int <- 2 # bin ranges 2 degrees
mean_int <- int/2
int_range <- seq(Tmin, Tmax + int, int) # generate bin sequence
bin_label <- seq(Tmin + mean_int, Tmax + mean_int, int) # generate labels
temp_data$TBins <- cut(TAir, breaks = int_range, ordered_result = FALSE, labels = bin_label)
The output table looks correct, but for some reason it shows a sequential additional column, shifts column names, and collapse all values eliminating blank cells. Something like this:
Air_T (oC) TBins
1 8.16 8.5
2 10.88 10.5
3 5.28 NA
4 19.82 20.5
5 23.62 24.5
6 13.14 14.5
7 28.84 28.5
8 32.21 32.5
9 17.44 18.5
10 31.21 32.5
Any ideas on where am I failing and how to solve it?
Upvotes: 0
Views: 268
Reputation: 79328
v<-ceiling(max(dat$V1,na.rm=T))
breaks<-seq(8,v,2)
labels=seq(8.5,length.out=length(s)-1,by=2)
transform(dat,Tbins=cut(V1,breaks,labels))
V1 Tbins
1 8.16 8.5
2 10.88 10.5
3 5.28 <NA>
4 NA <NA>
5 19.82 18.5
6 23.62 22.5
7 13.14 12.5
8 NA <NA>
9 NA <NA>
10 28.84 28.5
11 32.21 <NA>
12 17.44 16.5
13 31.21 30.5
This result follows the logic given: we have
paste(seq(8,v,2),seq(9.99,v,by=2),sep="-")
[1] "8-9.99" "10-11.99" "12-13.99" "14-15.99" "16-17.99" "18-19.99" "20-21.99"
[8] "22-23.99" "24-25.99" "26-27.99" "28-29.99" "30-31.99"
From this we can tell that 19.82
will lie between 18
and 20
thus given the value 18.5
, similar to 10.88
being between 10-11.99
thus assigned the value 10.5
Upvotes: 1