Reputation: 99
Within a large dataframe, I am trying to create a new row that groups specific data from other rows, based on the identity of another factor. Here is some example data:
> Species Status Value
> A Introduced 10
> A Native 3
> B Crypt 6
> C Introduced 19
> C Native 4
For each Species, I would like to create a new row, that only takes the data for the Status "Introduced" or "Crypt", and ignores the data in "Native" Status. Each species either has data for only "Introduced" and "Native" or only "Crypt".
Thus, my desired output will look like this:
> Species Status Value
> A Introduced 10
> A Native 3
> A IC.Total 10
> B Crypt 6
> B IC.Total 6
> C Introduced 19
> C Native 4
> C IC.Total 19
Is a for loop the best way to go about this, or is there a more elegant way? Any suggestions would be great-thanks for your help!
Upvotes: 3
Views: 464
Reputation: 193657
You can use merge
and aggregate
(even though there isn't anything to aggregate):
merge(mydf,
cbind(aggregate(Value ~ Species, mydf, sum,
subset = c(Status != "Native")),
Status = "IC.Total"),
all = TRUE)
# Species Status Value
# 1 A Introduced 10
# 2 A Native 3
# 3 A IC.Total 10
# 4 B Crypt 6
# 5 B IC.Total 6
# 6 C Introduced 19
# 7 C Native 4
# 8 C IC.Total 19
I've used aggregate
because it has a convenient argument that lets you subset your data. In this case, we're not interested in "Native". Furthermore, we know that we will never have "Introduced" and "Crypt" together for one species, and we know that there will never be more than one value for "Introduced" or "Crypt", so using sum
as our aggregation function wouldn't change anything.
This concept behind this solution works even when you have more than one "Value" variable, as you indicate you have in your comment, but a few slight modifications need to be made, as demonstrated below.
First, let's make up some data:
mydf <- data.frame(
Species = c("A", "A", "B", "C", "C"),
Status = c("Introduced", "Native", "Crypt", "Introduced", "Native"),
Value1 = c(10, 3, 6, 19, 4),
Value2 = c(6, 8, 12, 19, 5),
Value3 = c(18, 19, 14, 13, 2))
mydf
# Species Status Value1 Value2 Value3
# 1 A Introduced 10 6 18
# 2 A Native 3 8 19
# 3 B Crypt 6 12 14
# 4 C Introduced 19 19 13
# 5 C Native 4 5 2
Second, use aggregate
and merge
as before, but note the slight differences. First, we can't use subset
the way we did earlier, so instead of aggregating over the whole dataset, aggregate only over the rows we're interested in. Second, we've added "Status" in as a grouping variable, which shouldn't make any difference in your outcome, from what you've described as the current structure of your data. Third, after we aggregate, we need to remove the "Status" column and add in a new status column (That's what the [-2]
code is doing--removing the second column.)
Here it is, all in one tidy package:
merge(mydf,
cbind(aggregate(. ~ Species + Status,
mydf[mydf$Status != "Native", ], sum)[-2],
Status = "IC.Total"),
all = TRUE)
# Species Status Value1 Value2 Value3
# 1 A Introduced 10 6 18
# 2 A Native 3 8 19
# 3 A IC.Total 10 6 18
# 4 B Crypt 6 12 14
# 5 B IC.Total 6 12 14
# 6 C Introduced 19 19 13
# 7 C Native 4 5 2
# 8 C IC.Total 19 19 13
Upvotes: 1
Reputation: 55410
The following uses the data.table
package.
Assuming your original data.frame is called myDat
:
library(data.table)
myDT <- data.table(myDat, key="Species")
# Creates a new DT, of only the Speices column
myDT2 <- setkey(unique(myDT[, list(Species)]), "Species")
# Add IC.Total values
myDT2[myDT[Status=="Introduced"], c("Status", "ValueC") := list("IC.Total", Value)]
# Add Crypt values
myDT2[myDT[Status=="Crypt"], c("Status", "ValueC") := list("Crypt", Value)]
# fix the column name
setnames(myDT2, "ValueC", "Value")
# combine and sort by speicies
myDT <- setkey(rbind(myDT, myDT2), "Species")
myDT
# Species Status Value
# 1: A Introduced 10
# 2: A Native 3
# 3: A IC.Total 10
# 4: B Crypt 6
# 5: B Crypt 6
# 6: C Introduced 19
# 7: C Native 4
# 8: C IC.Total 19
Note, if you dont want to duplicate the crypt
count, simply take out that line above.
Upvotes: 2