Reputation: 83275
In my dataset I have a continuous variable mag
from which I derived a categorial variable mag.cat
which has four categories: 1 for the values between 0 and 1 for mag
, 2 for the values between 1 and 2 for mag
, 3 for the values between 2 and 3 for mag
& 4 for the values above 3 for mag
. A subset of the data looks like this:
location mag depth mag.cat
1 Assen 1.8 1.0 2
2 Hooghalen 2.5 1.5 3
3 Purmerend 0.7 1.2 1
4 Emmen 2.2 3.0 3
5 Geelbroek 3.6 3.0 4
6 Eleveld 2.7 3.0 3
I want to summarize this dataframe into a new one with just one row for each location.
I did this with:
df.new <- ddply(df, .(location), summarise, n.tot = as.numeric(length(location)),
gem.mag = round(mean(mag),1),
n.1 = as.numeric(length(location[mag == 1])),
n.2 = as.numeric(length(location[mag == 2])),
n.3 = as.numeric(length(location[mag == 3])),
n.4 = as.numeric(length(location[mag == 4]))
)
The n.1, n.2, n.3 & n.4 variables are supposed to contain the counts for each category for each location. The sum of these variables should logically be equal to n.tot
, but they aren't. This can be seen in the head of the new dataframe:
location n.tot gem.mag n.1 n.2 n.3 n.4
1 Achterdiep 5 1.1 2 0 0 0
2 Alkmaar 4 3.2 0 0 1 0
3 Altena 1 1.3 0 0 0 0
4 Amelanderwad 2 1.8 0 0 0 0
5 Amen 6 1.1 0 0 0 0
6 Amerbrug 1 0.9 0 0 0 0
I expected something like:
location n.tot gem.mag n.1 n.2 n.3 n.4
1 Achterdiep 5 1.1 2 2 0 1
2 Alkmaar 4 3.2 0 3 1 0
3 Altena 1 1.3 0 1 0 0
4 Amelanderwad 2 1.8 0 1 1 0
5 Amen 6 1.1 3 2 0 1
6 Amerbrug 1 0.9 1 0 0 0
What am I doing wrong?
Upvotes: 0
Views: 523
Reputation: 263451
Why not:
res <- xtabs( ~ location + mag.cat, data=df)
res
If you want totals as a column, then cbind(tot.n= rowSums(res), res)
.
Means of mag: with(df, tapply(mag, location, mean))
Everything:
cbind( gem.mag= with(df, tapply(mag, location, mean)),
tot.n= rowSums(res),
res)
I suppose the answer is not complete without a plyr version:
require(plyr)
df.new <- ddply(df, .(location), summarise, n.tot = as.numeric(length(location)),
gem.mag = round(mean(mag),1),
n.1 = sum(mag.cat==1),
n.2 = sum(mag.cat==2),
n.3 = sum(mag.cat==3),
n.4 = sum(mag.cat==4)
)
> df.new
location n.tot gem.mag n.1 n.2 n.3 n.4
1 Assen 1 1.8 0 1 0 0
2 Eleveld 1 2.7 0 0 1 0
3 Emmen 1 2.2 0 0 1 0
4 Geelbroek 1 3.6 0 0 0 1
5 Hooghalen 1 2.5 0 0 1 0
6 Purmerend 1 0.7 1 0 0 0
Upvotes: 3