David Gerrard
David Gerrard

Reputation: 161

Applying new columns after split

Trying to add a % column following a split function.

Have written the following that works:

percs <- function(agg, deporur=0, all=TRUE, full=FALSE){
  work <- data.frame(NoNA$IMD_NATIONAL_QUINTILE, NoNA$UR,agg)
  work <- as.data.frame(table(work))
  work <-split(work, work[,deporur])

 work

}

With my data, this returns:

$`1`
   NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
1                           1       0   1    0
6                           1   Rural   1    0
11                          1   Urban   1   43
16                          1       0   2    0
21                          1   Rural   2    0
26                          1   Urban   2   37

$`2`
   NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
2                           2       0   1    0
7                           2   Rural   1    3
12                          2   Urban   1   30
17                          2       0   2    0
22                          2   Rural   2    1
27                          2   Urban   2   27

$`3`
   NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
3                           3       0   1    0
8                           3   Rural   1    7
13                          3   Urban   1   25
18                          3       0   2    0
23                          3   Rural   2    3
28                          3   Urban   2   13

$`4`
   NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
4                           4       0   1    0
9                           4   Rural   1    9
14                          4   Urban   1   30
19                          4       0   2    0
24                          4   Rural   2    0
29                          4   Urban   2   18

$`5`
   NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq
5                           5       0   1    0
10                          5   Rural   1   13
15                          5   Urban   1   40
20                          5       0   2    0
25                          5   Rural   2   11
30                          5   Urban   2   27

I want to add an extra column at the end of each to show the percentage of each.

I can make it work in console as follows:

test<-percs(NoNA$Q1, 1)
test$"1"$newcol <- test$"1"[,4]/sum(test$"1"[,4])
test$"1"

   NoNA.IMD_NATIONAL_QUINTILE NoNA.UR agg Freq newcol
1                           1       0   1    0 0.0000
6                           1   Rural   1    0 0.0000
11                          1   Urban   1   43 0.5375
16                          1       0   2    0 0.0000
21                          1   Rural   2    0 0.0000
26                          1   Urban   2   37 0.4625

However, I cannot work out how to make it work in a loop, to go through every dataframe stored in the work dataframe and add an additional column. If I access objects using the $ operator then it lets me work with the dataframe, however using the [] operators like I normally would in a for loop, it returns lists and won't let me add a column.

Any thoughts on where I am going wrong here?

Upvotes: 2

Views: 67

Answers (5)

David Gerrard
David Gerrard

Reputation: 161

Looks like I got there in the end, thank you all very much for your assistance.

Was an issue with using [[]] instead of []

percs <- function(agg, deporur=0, all=TRUE, full=FALSE){

  work <- data.frame(NoNA$IMD_NATIONAL_QUINTILE, NoNA$UR,agg)
  work <- as.data.frame(table(work))
  work <-split(work, work[,deporur])

    for(i in 1:length(work)){
    x<-as.data.frame(work[i])
    work[[i]]$NewCol <-x[,4]/sum(x[,4])

      }

     work

    }

Upvotes: 1

AntoniosK
AntoniosK

Reputation: 16121

Here's your dataset

> dt <- expand.grid(type=1:2, qty=1:5)
> dt = split(dt, dt$type)
> 
> dt
$`1`
  type qty
1    1   1
3    1   2
5    1   3
7    1   4
9    1   5

$`2`
   type qty
2     2   1
4     2   2
6     2   3
8     2   4
10    2   5

Here is the loop (if you really want a loop) using [[]] instead of []:

> for (i in 1:length(dt)){
+ dt[[i]]$prc = dt[[i]]$qty/sum(dt[[i]]$qty)
+ }
> 
> dt
$`1`
  type qty        prc
1    1   1 0.06666667
3    1   2 0.13333333
5    1   3 0.20000000
7    1   4 0.26666667
9    1   5 0.33333333

$`2`
   type qty        prc
2     2   1 0.06666667
4     2   2 0.13333333
6     2   3 0.20000000
8     2   4 0.26666667
10    2   5 0.33333333

And here's a dplyr version that combines the list elements to one dataset:

> dt <- expand.grid(type=1:2, qty=1:5)
> dt = split(dt, dt$type)
> 
> do.call(rbind, dt) %>% group_by(type) %>% mutate(prc = qty/sum(qty)) %>% ungroup
Source: local data frame [10 x 3]

   type qty        prc
1     1   1 0.06666667
2     1   2 0.13333333
3     1   3 0.20000000
4     1   4 0.26666667
5     1   5 0.33333333
6     2   1 0.06666667
7     2   2 0.13333333
8     2   3 0.20000000
9     2   4 0.26666667
10    2   5 0.33333333

Upvotes: 1

SabDeM
SabDeM

Reputation: 7190

Just because my comment was becoming longer:

just use a

perc <- lapply(work, function(x) x[, 4] / sum(x[, 4] )

and then append to your data. I cannot test my code because it is hard to read your data (at least for me), it could be better if you provide a dput of your data. Even though a dplyr approach would be better, something like:

df %>% group_by(NoNA.IMD_NATIONAL_QUINTILE) %>% mutate(perc  = Freq / sum(Freq))

Upvotes: 4

data paRty
data paRty

Reputation: 218

Without a sample of your data as well I can't test out my answer, but I think using ddply instead of split (or split after ddply if you want lists) is the way to go.

I believe you should be able to do something like this:

library(plyr)
test <- ddply(work, .(NoNA.IMD_NATIONAL_QUINTILE), summarize, newcol = Freq/sum(Freq))

Upvotes: 3

user295691
user295691

Reputation: 7248

Here's a simple version on test data

df <- expand.grid(type=1:10, qty=1:5)
split(df, df$type)
$`1`
   type qty
1     1   1
11    1   2
21    1   3
31    1   4
41    1   5

$`2`
   type qty
2     2   1
12    2   2
22    2   3
32    2   4
42    2   5
...

Then to compute the percentage, you can use lapply

> lapply(split(df, df$type), function(d) { d$asdf <- cumsum(d$qty)/sum(d$qty); d })
$`1`
   type qty       asdf
1     1   1 0.06666667
11    1   2 0.20000000
21    1   3 0.40000000
31    1   4 0.66666667
41    1   5 1.00000000

$`2`
   type qty       asdf
2     2   1 0.06666667
12    2   2 0.20000000
22    2   3 0.40000000
32    2   4 0.66666667
42    2   5 1.00000000
...

Upvotes: 4

Related Questions