Edwin
Edwin

Reputation: 961

selecting a subset of data based on another column

I have a dataset which looks something like this:

     Area     Num
[1,] "Area 1" "99"  
[2,] "Area 3" "85"  
[3,] "Area 1" "60"  
[4,] "Area 2" "90"  
[5,] "Area 1" "40"  
[6,] "Area 3" NA    
[7,] "Area 4" "10" 
...

code:

structure(c("Area 1", "Area 3", "Area 1", "Area 2", "Area 1", 
"Area 3", "Area 4", "99", "85", "60", "90", "40", NA, "10"), .Dim = c(7L, 
2L), .Dimnames = list(NULL, c("Area", "Num")))

I need to do some calculation on values in Num for each Area, for example calculating the sum of each Area, or the summary of each Area.

I'm thinking of using a nested for loop to achieve this, but I'm not sure how to.

Upvotes: 0

Views: 81

Answers (3)

Worice
Worice

Reputation: 4037

In order to apply the function to every level of the factor, we can recurse to the by function:

dt <- structure(c("Area 1", "Area 3", "Area 1", "Area 2", "Area 1", 
              "Area 3", "Area 4", "99", "85", "60", "90", "40", NA, "10"), .Dim = c(7L, 2L), .Dimnames = list(NULL, c("Area", "Num")))


dt <- data.frame(dt)
dt$Num <- as.numeric(dt$Num)

t <- by(dt$Num, dt$Area, sum)
t

Upvotes: 2

Kunal Puri
Kunal Puri

Reputation: 3427

Doing the same thing using data.table

library(data.table)

dt <- data.table(df)

dt[,sum(as.numeric(Num),na.rm=T),by=Area]
##         Area  V1
##    1: Area 1 199
##    2: Area 3  85
##    3: Area 2  90
##    4: Area 4  10

Upvotes: 1

shrgm
shrgm

Reputation: 1344

You can do this using aggregate, but the dplyr package makes it very easy to work with such problems. There are plenty of duplicates of this question, though.

library(dplyr)

df <- structure(c("Area 1", "Area 3", "Area 1", "Area 2", "Area 1", 
                  "Area 3", "Area 4", "99", "85", "60", "90", "40", NA, "10"), .Dim = c(7L, 
                                                                                        2L), .Dimnames = list(NULL, c("Area", "Num")))


df <- data.frame(df)
df$Num <- as.numeric(df$Num)

df2 <- df %>%
  group_by(Area) %>%
  summarise(totalNum = sum(Num, na.rm=T))

df2

Upvotes: 2

Related Questions