Sohan Shirodkar
Sohan Shirodkar

Reputation: 520

Use apply() to a subset of dataframe in R

I am a newbie in this world of R. From whatever I have read, I know that apply() is used to iterate over each row/column in a matrix/vector/dataframe.

I have statement in my code:

a$count <- apply(a[1:9,],1,countRows,species="setosa")

The coutRows function is as follows:

countRows <- function(x,species){
    count <- sum(iris$sl == x['sl'] & iris$sw == x['sw'] & iris$Species == species)
}

My intention is to operate over only the first 9 rows in a and add some values to the count column using whatever computation that is done in countRows(). This is why I make use of a[1:9,] as the first argument of apply().

For some reason, apply() operates on the complete dataframe. The content of a at the end of execution of above statement is shown below:

     sl   sw count    species
1   low  low     1     setosa
2   mid  low     0     setosa
3  high  low     0     setosa
4   low  mid    32     setosa
5   mid  mid     1     setosa
6  high  mid     0     setosa
7   low high    12     setosa
8   mid high     4     setosa
9  high high     0     setosa
10  low  low     1 versicolor
11  mid  low     0 versicolor
12 high  low     0 versicolor
13  low  mid    32 versicolor
14  mid  mid     1 versicolor
15 high  mid     0 versicolor
16  low high    12 versicolor
17  mid high     4 versicolor
18 high high     0 versicolor
19  low  low     1  virginica
20  mid  low     0  virginica
21 high  low     0  virginica
22  low  mid    32  virginica
23  mid  mid     1  virginica
24 high  mid     0  virginica
25  low high    12  virginica
26  mid high     4  virginica
27 high high     0  virginica

I expect the remaining 18 rows to contain 0 under count column because I have set everything to 0 initially.

Am I doing anything wrong in the apply() statement?

Upvotes: 0

Views: 57

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226911

As you mentioned in the comments, the narrow solution to your problem is to assign just to the bits of count you want to change:

a[1:9,'count'] <- apply(a[1:9,],1,countRows,species="setosa")

However, in looking at what you seem to be trying to do here, I have an alternate suggestion.

This is just to make up some data that look like yours:

cutnum <- ggplot2::cut_number
my_iris <- with(iris,
     data.frame(sw=cutnum(Sepal.Width,3,labels=c("low","mid","high")),
                sl=cutnum(Sepal.Length,3,labels=c("low","mid","high")),
                species=Species))

Now table will get counts of each row type, with the results as (in this case) a 3x3x3 array, and as.data.frame will convert the table to long format ...

as.data.frame(table(my_iris))

The results look like this:

      sw   sl    species Freq
 1   low  low     setosa    2
 2   mid  low     setosa   15
 3  high  low     setosa   28
 4   low  mid     setosa    0
 5   mid  mid     setosa    0
 ...

Upvotes: 1

Related Questions