Reputation: 320

Apply a function on a data-frame and return a data-frame

I have a data frame like this

 ID  07  08  09  10  year balance

abc   0   0   0   0  09    2123.00
efg   0   0   0   0  09    780.4
xyz   0   0   0   0  07    2402.9
prq   0   0   0   0  10    123.3
mno   0   0   0   0  07    679

I need to fill columns 07, 08, 09 and 10 based on column "year" and values in balance. For each ID, the column corresponding to value in column year is filled with value in balance. Apply this row-by-row.

For example, for 1st row, year is 09, so column 09 for that ID is filled with value 2123.00. Rest all year values remain 0.

For 3rd row, value of 24502.9 is filled in column 07 since its year value is 07. And so on row by row..

I output should come like this

 ID  07      08  09      10    year  balance

abc   0      0  2123.00  0      09    2123.00
efg   0      0  780.4    0      09    780.4
xyz  2402.9  0   0       0      07    2402.9
prq   0      0   0      123.3   10    123.3
mno  679     0   0       0      07    679

PS: I have already written a for loop for this. I need something faster than for loop. I'm actually dealing with thousands of data. I dont know if there is anything similar to apply that returns a data-frame

Upvotes: 0

Answers (4)

SatZ

Reputation: 450

You can use the data.table and reshape2 packages to do this.

You could use ID, year, balance columns of your data.frame and dcast with ID as the row and year in the columns and balance in the value

library(reshape2) 
library(data.table) 
final_output<-dcast(setDT(df),ID~year, value.var="balance")

If you want to re-order your columns you could use the snippet from the reference below: Reordering dcast data frame

final_output<-dcast(setDT(df),ID~reorder(year,year), value.var="balance")

Upvotes: 1

jay.sf

Reputation: 73592

I'm sure you want this

do.call(rbind, lapply(1:nrow(df1), function(i) {
  df1[i, df1[i, 6]] <- df1[i, 7] 
  df1[i, ]
  }))

Yields

   ID     07 08     09    10 year balance
1 abc    0.0  0 2123.0   0.0   09  2123.0
2 efg    0.0  0  780.4   0.0   09   780.4
3 xyz 2402.9  0    0.0   0.0   07  2402.9
4 prq    0.0  0    0.0 123.3   10   123.3
5 mno  679.0  0    0.0   0.0   07   679.0

Data

df1 <- structure(list(ID = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("abc", 
"efg", "mno", "prq", "xyz"), class = "factor"), `07` = c(0L, 
0L, 0L, 0L, 0L), `08` = c(0L, 0L, 0L, 0L, 0L), `09` = c(0L, 0L, 
0L, 0L, 0L), `10` = c(0L, 0L, 0L, 0L, 0L), year = c("09", "09", 
"07", "10", "07"), balance = c(2123, 780.4, 2402.9, 123.3, 679
)), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 1

Simon Larsen

Reputation: 742

Essentially what you want to do is to convert the right side of the data frame from long format to wide format. You can do this using the spread function in tidyr.

library(tidyr)
library(dplyr)

D <- read.table(header=TRUE, text="
ID  07  08  09  10  year balance
abc  0   0   0   0  09    2123.00
efg  0   0   0   0  09    780.4
xyz  0   0   0   0  07    24502.9
prq  0   0   0   0  10    123.3
mno  0   0   0   0  07    679")

D %>%
  select(ID, year, balance) %>%
  spread(year, balance, fill=0) %>%
  bind_cols(D[,c("year","balance")])

#>    ID       7      9    10 year balance
#> 1 abc     0.0 2123.0   0.0    9  2123.0
#> 2 efg     0.0  780.4   0.0    9   780.4
#> 3 mno   679.0    0.0   0.0    7 24502.9
#> 4 prq     0.0    0.0 123.3   10   123.3
#> 5 xyz 24502.9    0.0   0.0    7   679.0

Note: the year 08 is missing from the output because it is missing in your example data.

Upvotes: 3

Zahiro Mor

Reputation: 1718

you can use 4 lines:

df$`07` <- ifelse(test = df$year=='07',yes = df$balance, no=0)
df$`08` <- ifelse(test = df$year=='08',yes = df$balance, no=0) 
df$`09` <- ifelse(test = df$year=='09',yes = df$balance, no=0)
df$`10` <- ifelse(test = df$year=='10',yes = df$balance, no=0)

I think it'll work superfast comparing to a loop

Upvotes: 0

Apply a function on a data-frame and return a data-frame

Answers (4)

Related Questions