user2062207
user2062207

Reputation: 955

Combining multiple columns of values together in R

So I scripted a code that allows me to bin values in a single column. However, the problem I have is that my data is contained in multiple columns, all of varying length and with NAs in it. The script I have for binning goes like this:-

bin <- seq(min(data[, 1]), max(data[, 1]), by = 0.0005)
binnedData <- tapply(data[, 1], cut(data[, 1], breaks = bin), median)

I was wondering whether there was a way I could combine all the values from all columns into one massive long column so I can basically run this, or whether there's a way in which I can tweak this so that I can run it on my 1000 x 1000 matrix

This is taken from part of my data:-

102.23144 123.23242 102.23145
103.23144 123.23242 102.36563
103.83637        NA 102.36356
104.23225        NA 102.23423
105.87890        NA        NA

And the expected outcome with just combining column values:-

102.23144 
103.23144 
103.83637      
104.23225   
105.87890
123.23242
123.23242
102.23145
102.36563
102.36356
102.23423

And outcome with binning:-

(102.0000 - 102.0005) - Median of all values that fall into bin
(102.0005 - 102.0010) - Median of all values that fall into bin
(102.0015 - 102.0020) - Median of all values that fall into bin

Thanks

Upvotes: 0

Views: 700

Answers (2)

amitm
amitm

Reputation: 26

have you tried using melt function from reshape2 package.

Here is some of your test data:

test <- data.frame(V1=c(102.2314,103.2314,103.8364,104.2322,105.8789),
                   V2=c(123.2324,123.2324,NA,NA,NA),
                   V3=c(102.2314,102.3656,102.3636,102.2342,NA)
                  )

> test
        V1       V2       V3
1 102.2314 123.2324 102.2314
2 103.2314 123.2324 102.3656
3 103.8364       NA 102.3636
4 104.2322       NA 102.2342
5 105.8789       NA       NA

and then use melt function -

test_m <- melt(test)

But there are NAs.

> test_m
   variable    value
1        V1 102.2314
2        V1 103.2314
3        V1 103.8364
4        V1 104.2322
5        V1 105.8789
6        V2 123.2324
7        V2 123.2324
8        V2       NA
9        V2       NA
10       V2       NA
11       V3 102.2314
12       V3 102.3656
13       V3 102.3636
14       V3 102.2342
15       V3       NA

So, now same step with this filter -

test_m<- melt(test)[which(!(is.na(melt(test)[,2]))),]

> test_m
   variable    value
1        V1 102.2314
2        V1 103.2314
3        V1 103.8364
4        V1 104.2322
5        V1 105.8789
6        V2 123.2324
7        V2 123.2324
11       V3 102.2314
12       V3 102.3656
13       V3 102.3636
14       V3 102.2342

So, NAs removed. You can select only the 2nd col of the data remove the variable name col.

Upvotes: 1

akrun
akrun

Reputation: 886948

If you have a data.frame, this could be done with stack from base R

na.omit(stack(test))[,1,drop=FALSE]
#    values
#1  102.2314
#2  103.2314
#3  103.8364
#4  104.2322
#5  105.8789
#6  123.2324
#7  123.2324
#11 102.2314
#12 102.3656
#13 102.3636
#14 102.2342

Or

data.frame(V1= unname(na.omit(unlist(test))))

data

test <- structure(list(V1 = c(102.2314, 103.2314, 103.8364, 104.2322, 
105.8789), V2 = c(123.2324, 123.2324, NA, NA, NA), V3 = c(102.2314, 
102.3656, 102.3636, 102.2342, NA)), .Names = c("V1", "V2", "V3"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))

Upvotes: 1

Related Questions