user2096647
user2096647

Reputation: 99

R: Grouping two rows into a new row based on factor identity

Within a large dataframe, I am trying to create a new row that groups specific data from other rows, based on the identity of another factor. Here is some example data:

> Species    Status    Value
> A         Introduced   10
> A          Native      3
> B          Crypt       6
> C         Introduced   19
> C          Native      4

For each Species, I would like to create a new row, that only takes the data for the Status "Introduced" or "Crypt", and ignores the data in "Native" Status. Each species either has data for only "Introduced" and "Native" or only "Crypt".

Thus, my desired output will look like this:

> Species    Status    Value
> A         Introduced   10
> A          Native      3
> A         IC.Total     10
> B          Crypt       6
> B         IC.Total     6
> C         Introduced   19
> C          Native      4
> C         IC.Total     19

Is a for loop the best way to go about this, or is there a more elegant way? Any suggestions would be great-thanks for your help!

Upvotes: 3

Views: 464

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193657

You can use merge and aggregate (even though there isn't anything to aggregate):

merge(mydf, 
      cbind(aggregate(Value ~ Species, mydf, sum, 
                      subset = c(Status != "Native")), 
            Status = "IC.Total"),
      all = TRUE)
#   Species     Status Value
# 1       A Introduced    10
# 2       A     Native     3
# 3       A   IC.Total    10
# 4       B      Crypt     6
# 5       B   IC.Total     6
# 6       C Introduced    19
# 7       C     Native     4
# 8       C   IC.Total    19

I've used aggregate because it has a convenient argument that lets you subset your data. In this case, we're not interested in "Native". Furthermore, we know that we will never have "Introduced" and "Crypt" together for one species, and we know that there will never be more than one value for "Introduced" or "Crypt", so using sum as our aggregation function wouldn't change anything.


Update

This concept behind this solution works even when you have more than one "Value" variable, as you indicate you have in your comment, but a few slight modifications need to be made, as demonstrated below.

First, let's make up some data:

mydf <- data.frame(
  Species = c("A", "A", "B", "C", "C"),
  Status = c("Introduced", "Native", "Crypt", "Introduced", "Native"),
  Value1 = c(10, 3, 6, 19, 4),
  Value2 = c(6, 8, 12, 19, 5),
  Value3 = c(18, 19, 14, 13, 2))
mydf
#   Species     Status Value1 Value2 Value3
# 1       A Introduced     10      6     18
# 2       A     Native      3      8     19
# 3       B      Crypt      6     12     14
# 4       C Introduced     19     19     13
# 5       C     Native      4      5      2

Second, use aggregate and merge as before, but note the slight differences. First, we can't use subset the way we did earlier, so instead of aggregating over the whole dataset, aggregate only over the rows we're interested in. Second, we've added "Status" in as a grouping variable, which shouldn't make any difference in your outcome, from what you've described as the current structure of your data. Third, after we aggregate, we need to remove the "Status" column and add in a new status column (That's what the [-2] code is doing--removing the second column.)

Here it is, all in one tidy package:

merge(mydf, 
      cbind(aggregate(. ~ Species + Status, 
                      mydf[mydf$Status != "Native", ], sum)[-2], 
            Status = "IC.Total"),
      all = TRUE)
#   Species     Status Value1 Value2 Value3
# 1       A Introduced     10      6     18
# 2       A     Native      3      8     19
# 3       A   IC.Total     10      6     18
# 4       B      Crypt      6     12     14
# 5       B   IC.Total      6     12     14
# 6       C Introduced     19     19     13
# 7       C     Native      4      5      2
# 8       C   IC.Total     19     19     13

Upvotes: 1

Ricardo Saporta
Ricardo Saporta

Reputation: 55410

The following uses the data.table package.
Assuming your original data.frame is called myDat:

library(data.table)
myDT <- data.table(myDat, key="Species")

# Creates a new DT, of only the Speices column
myDT2 <- setkey(unique(myDT[, list(Species)]), "Species")

# Add IC.Total values
myDT2[myDT[Status=="Introduced"], c("Status", "ValueC") := list("IC.Total", Value)]

# Add Crypt values
myDT2[myDT[Status=="Crypt"], c("Status", "ValueC") := list("Crypt", Value)]

# fix the column name
setnames(myDT2, "ValueC", "Value")

# combine and sort by speicies
myDT <- setkey(rbind(myDT, myDT2), "Species")

myDT
#    Species     Status Value
# 1:       A Introduced    10
# 2:       A     Native     3
# 3:       A   IC.Total    10
# 4:       B      Crypt     6
# 5:       B      Crypt     6
# 6:       C Introduced    19
# 7:       C     Native     4
# 8:       C   IC.Total    19

Note, if you dont want to duplicate the crypt count, simply take out that line above.

Upvotes: 2

Related Questions