Reputation: 955
So I scripted a code that allows me to bin values in a single column. However, the problem I have is that my data is contained in multiple columns, all of varying length and with NAs in it. The script I have for binning goes like this:-
bin <- seq(min(data[, 1]), max(data[, 1]), by = 0.0005)
binnedData <- tapply(data[, 1], cut(data[, 1], breaks = bin), median)
I was wondering whether there was a way I could combine all the values from all columns into one massive long column so I can basically run this, or whether there's a way in which I can tweak this so that I can run it on my 1000 x 1000 matrix
This is taken from part of my data:-
102.23144 123.23242 102.23145
103.23144 123.23242 102.36563
103.83637 NA 102.36356
104.23225 NA 102.23423
105.87890 NA NA
And the expected outcome with just combining column values:-
102.23144
103.23144
103.83637
104.23225
105.87890
123.23242
123.23242
102.23145
102.36563
102.36356
102.23423
And outcome with binning:-
(102.0000 - 102.0005) - Median of all values that fall into bin
(102.0005 - 102.0010) - Median of all values that fall into bin
(102.0015 - 102.0020) - Median of all values that fall into bin
Thanks
Upvotes: 0
Views: 700
Reputation: 26
have you tried using melt function from reshape2 package.
Here is some of your test data:
test <- data.frame(V1=c(102.2314,103.2314,103.8364,104.2322,105.8789),
V2=c(123.2324,123.2324,NA,NA,NA),
V3=c(102.2314,102.3656,102.3636,102.2342,NA)
)
> test
V1 V2 V3
1 102.2314 123.2324 102.2314
2 103.2314 123.2324 102.3656
3 103.8364 NA 102.3636
4 104.2322 NA 102.2342
5 105.8789 NA NA
and then use melt function -
test_m <- melt(test)
But there are NAs.
> test_m
variable value
1 V1 102.2314
2 V1 103.2314
3 V1 103.8364
4 V1 104.2322
5 V1 105.8789
6 V2 123.2324
7 V2 123.2324
8 V2 NA
9 V2 NA
10 V2 NA
11 V3 102.2314
12 V3 102.3656
13 V3 102.3636
14 V3 102.2342
15 V3 NA
So, now same step with this filter -
test_m<- melt(test)[which(!(is.na(melt(test)[,2]))),]
> test_m
variable value
1 V1 102.2314
2 V1 103.2314
3 V1 103.8364
4 V1 104.2322
5 V1 105.8789
6 V2 123.2324
7 V2 123.2324
11 V3 102.2314
12 V3 102.3656
13 V3 102.3636
14 V3 102.2342
So, NAs removed. You can select only the 2nd col of the data remove the variable name col.
Upvotes: 1
Reputation: 886948
If you have a data.frame
, this could be done with stack
from base R
na.omit(stack(test))[,1,drop=FALSE]
# values
#1 102.2314
#2 103.2314
#3 103.8364
#4 104.2322
#5 105.8789
#6 123.2324
#7 123.2324
#11 102.2314
#12 102.3656
#13 102.3636
#14 102.2342
Or
data.frame(V1= unname(na.omit(unlist(test))))
test <- structure(list(V1 = c(102.2314, 103.2314, 103.8364, 104.2322,
105.8789), V2 = c(123.2324, 123.2324, NA, NA, NA), V3 = c(102.2314,
102.3656, 102.3636, 102.2342, NA)), .Names = c("V1", "V2", "V3"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
Upvotes: 1