Reputation: 16488
I'm a beginner in R
and pretty much everything I do comes from typical methodology I've learned from other languages. However, whenever I've seeked for R
related answers here, code structure was much different than what I'd have expected.
I have a data.table that contains panel data for individuals. I want to look at the mean outcome of a characteristic, and then split the sample in twice: Those that are above the median of the mean outcome, and those who are below.
Here's the structure of my data.table, yearly
:
user wage year
1: 65122111 9.74 2003
2: 65122111 7.85 2004
3: 65122111 97.16 2005
4: 65122111 48.22 2006
5: 65122111 91.24 2007
6: 65122111 9.35 2008
7: 65122112 80.00 2007
8: 65122112 0.00 2008
And here's what I do:
## get mean wages
meanWages <- yearly[, list(meanWage = mean(wage)), by=(user)]
## split by median
highWage <- meanWages[meanWage > median(meanWages[, meanWage]), user]
lowWage <- meanWages[meanWage < median(meanWages[, meanWage]), user]
## split original sample
yearlyHigh <- yearly[is.element(user,highWage),]
yearlyLow <- yearly[is.element(user,highWage),]
I suppose this is giving me what I expect (checking for correctness is quite cumbersome), but it seems to be very clumpy and inefficient. What would be a more efficient and compressed way of doing the same thing?
Upvotes: 4
Views: 112
Reputation: 22293
You can also use the dplyr
package. Might not be as efficient, but it is very easy to read.
yearly %>%
group_by(user) %>%
mutate(meanwage = mean(wage)) %>%
filter(meanwage >= median(meanwage))
Rarely is it helpful to actually split the data. Just group by the wage category instead and use groupwise operations instead.
yearly %>%
group_by(user) %>%
mutate(meanwage = mean(wage)) %>%
ungroup %>%
mutate(cat = ifelse(meanwage >= median(meanwage), "high", "low")) %>%
group_by(cat) %>%
do(data.table("further analyses here ..."))
Or just using data.table
:
yearly[, meanwage := mean(wage), by=user]
yearly[, cat := ifelse(meanwage >= median(meanwage), "high", "low")]
yearly[, "further analyses here ...", by = cat]
Upvotes: 3
Reputation: 667
you can try the following, although I can't be certain that this is most efficient or compact.
yearly[, meanwage := mean(wage), by=user]
yearlyHigh <- yearly[meanwage >= median(meanwage)]
yearlyLow <- yearly[meanwage < median(meanwage)]
Upvotes: 3