Reputation: 161
Trying to add a % column following a split function.
Have written the following that works:
percs <- function(agg, deporur=0, all=TRUE, full=FALSE){
work <- data.frame(NoNA$IMD_NATIONAL_QUINTILE, NoNA$UR,agg)
work <- as.data.frame(table(work))
work <-split(work, work[,deporur])
work
}
With my data, this returns:
$`1`
NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
1 1 0 1 0
6 1 Rural 1 0
11 1 Urban 1 43
16 1 0 2 0
21 1 Rural 2 0
26 1 Urban 2 37
$`2`
NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
2 2 0 1 0
7 2 Rural 1 3
12 2 Urban 1 30
17 2 0 2 0
22 2 Rural 2 1
27 2 Urban 2 27
$`3`
NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
3 3 0 1 0
8 3 Rural 1 7
13 3 Urban 1 25
18 3 0 2 0
23 3 Rural 2 3
28 3 Urban 2 13
$`4`
NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
4 4 0 1 0
9 4 Rural 1 9
14 4 Urban 1 30
19 4 0 2 0
24 4 Rural 2 0
29 4 Urban 2 18
$`5`
NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
5 5 0 1 0
10 5 Rural 1 13
15 5 Urban 1 40
20 5 0 2 0
25 5 Rural 2 11
30 5 Urban 2 27
I want to add an extra column at the end of each to show the percentage of each.
I can make it work in console as follows:
test<-percs(NoNA$Q1, 1)
test$"1"$newcol <- test$"1"[,4]/sum(test$"1"[,4])
test$"1"
NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq newcol
1 1 0 1 0 0.0000
6 1 Rural 1 0 0.0000
11 1 Urban 1 43 0.5375
16 1 0 2 0 0.0000
21 1 Rural 2 0 0.0000
26 1 Urban 2 37 0.4625
However, I cannot work out how to make it work in a loop, to go through every dataframe stored in the work dataframe and add an additional column. If I access objects using the $ operator then it lets me work with the dataframe, however using the [] operators like I normally would in a for loop, it returns lists and won't let me add a column.
Any thoughts on where I am going wrong here?
Upvotes: 2
Views: 67
Reputation: 161
Looks like I got there in the end, thank you all very much for your assistance.
Was an issue with using [[]] instead of []
percs <- function(agg, deporur=0, all=TRUE, full=FALSE){
work <- data.frame(NoNA$IMD_NATIONAL_QUINTILE, NoNA$UR,agg)
work <- as.data.frame(table(work))
work <-split(work, work[,deporur])
for(i in 1:length(work)){
x<-as.data.frame(work[i])
work[[i]]$NewCol <-x[,4]/sum(x[,4])
}
work
}
Upvotes: 1
Reputation: 16121
Here's your dataset
> dt <- expand.grid(type=1:2, qty=1:5)
> dt = split(dt, dt$type)
>
> dt
$`1`
type qty
1 1 1
3 1 2
5 1 3
7 1 4
9 1 5
$`2`
type qty
2 2 1
4 2 2
6 2 3
8 2 4
10 2 5
Here is the loop (if you really want a loop) using [[]] instead of []:
> for (i in 1:length(dt)){
+ dt[[i]]$prc = dt[[i]]$qty/sum(dt[[i]]$qty)
+ }
>
> dt
$`1`
type qty prc
1 1 1 0.06666667
3 1 2 0.13333333
5 1 3 0.20000000
7 1 4 0.26666667
9 1 5 0.33333333
$`2`
type qty prc
2 2 1 0.06666667
4 2 2 0.13333333
6 2 3 0.20000000
8 2 4 0.26666667
10 2 5 0.33333333
And here's a dplyr version that combines the list elements to one dataset:
> dt <- expand.grid(type=1:2, qty=1:5)
> dt = split(dt, dt$type)
>
> do.call(rbind, dt) %>% group_by(type) %>% mutate(prc = qty/sum(qty)) %>% ungroup
Source: local data frame [10 x 3]
type qty prc
1 1 1 0.06666667
2 1 2 0.13333333
3 1 3 0.20000000
4 1 4 0.26666667
5 1 5 0.33333333
6 2 1 0.06666667
7 2 2 0.13333333
8 2 3 0.20000000
9 2 4 0.26666667
10 2 5 0.33333333
Upvotes: 1
Reputation: 7190
Just because my comment was becoming longer:
just use a
perc <- lapply(work, function(x) x[, 4] / sum(x[, 4] )
and then append to your data. I cannot test my code because it is hard to read your data (at least for me), it could be better if you provide a dput
of your data. Even though a dplyr
approach would be better, something like:
df %>% group_by(NoNA.IMD_NATIONAL_QUINTILE) %>% mutate(perc = Freq / sum(Freq))
Upvotes: 4
Reputation: 218
Without a sample of your data as well I can't test out my answer, but I think using ddply
instead of split
(or split
after ddply
if you want lists) is the way to go.
I believe you should be able to do something like this:
library(plyr)
test <- ddply(work, .(NoNA.IMD_NATIONAL_QUINTILE), summarize, newcol = Freq/sum(Freq))
Upvotes: 3
Reputation: 7248
Here's a simple version on test data
df <- expand.grid(type=1:10, qty=1:5)
split(df, df$type)
$`1`
type qty
1 1 1
11 1 2
21 1 3
31 1 4
41 1 5
$`2`
type qty
2 2 1
12 2 2
22 2 3
32 2 4
42 2 5
...
Then to compute the percentage, you can use lapply
> lapply(split(df, df$type), function(d) { d$asdf <- cumsum(d$qty)/sum(d$qty); d })
$`1`
type qty asdf
1 1 1 0.06666667
11 1 2 0.20000000
21 1 3 0.40000000
31 1 4 0.66666667
41 1 5 1.00000000
$`2`
type qty asdf
2 2 1 0.06666667
12 2 2 0.20000000
22 2 3 0.40000000
32 2 4 0.66666667
42 2 5 1.00000000
...
Upvotes: 4