user3542209
user3542209

Reputation: 13

Conditional calculating the numbers of values in column with R, part2

I have three vectors:

x <- c(1,1,1,1,1, 2,2,2,3,3,  3,3,3,4,4,  5,5,5,5,5 )
y <- c(2,2,1,3,2, 1,4,2,2,NA, 3,3,3,4,NA, 1,4,4,2,NA)
w <- c(1,45,NA,45,NA,45,41,45,96,25,12,NA,7,NA,4,45,12,45,32,56)

How can I find the number of values in W (don`t count NA) for each X (from 1 to 5) and for each Y (from 1 to 4)?

The output should be in format like:

y x  result
4 1   ...
4 2   ...
4 3
4 4
4 5
3 1
3 2
3 3
3 4
3 5
...
1 1
1 2
1 3
1 4 
1 5

Upvotes: 1

Views: 102

Answers (2)

MrFlick
MrFlick

Reputation: 206382

Here you can use xtabs to sum up the values where w is not NA

 dd<-as.data.frame(xtabs(!is.na(w)~y+x), stringsAsFactors=F)

the as.data.frame part changes it form a table to a long format as you desire. The only catch is that xtabs converts x and y to characters. You can convert them back to numeric with

dd$x <- as.numeric(dd$x)
dd$y <- as.numeric(dd$y)

and then sort with

dd <- dd[order(-dd$y,dd$x),]

Upvotes: 1

AndrewMacDonald
AndrewMacDonald

Reputation: 2950

Here is a solution using dplyr

library(dplyr)
dat <- data.frame(
  x = c(1,1,1,1,1, 2,2,2,3,3,  3,3,3,4,4,  5,5,5,5,5 ),
  y = c(2,2,1,3,2, 1,4,2,2,NA, 3,3,3,4,NA, 1,4,4,2,NA),
  w = c(1,45,NA,45,NA,45,41,45,96,25,12,NA,7,NA,4,45,12,45,32,56)
) 

dat %>%
  filter(!is.na(w)) %>%
  filter(!is.na(y)) %>%
  group_by(y,x) %>%
  summarise(result = n())

   y x result
1  1 2      1
2  1 5      1
3  2 1      2
4  2 2      1
5  2 3      1
6  2 5      1
7  3 1      1
8  3 3      2
9  4 2      1
10 4 5      2

The above simply gives the number of non-NA values of w for all non-NA levels of x and y If you want to see all combinations of non-NA levels of x and y, you can make them first by using expand.grid

foo <- dat %>%
  filter(!is.na(w)) %>%
  filter(!is.na(y)) %>%
  group_by(y,x) %>%
  summarise(result = sum(!is.na(w)))

with(dat, expand.grid(x = unique(x),
                      y = unique(y))) %>%
  filter(!is.na(y)) %>%
  left_join(foo) %>%
  arrange(desc(y))

   x y result
1  1 4     NA
2  2 4      1
3  3 4     NA
4  4 4     NA
5  5 4      2
6  1 3      1
7  2 3     NA
8  3 3      2
9  4 3     NA
10 5 3     NA
11 1 2      2
12 2 2      1
13 3 2      1
14 4 2     NA
15 5 2      1
16 1 1     NA
17 2 1      1
18 3 1     NA
19 4 1     NA
20 5 1      1

This technique gives you NA for "results" wherever the factor combination of x and y doesn't exist

Upvotes: 0

Related Questions