Reputation: 320
I have a data frame like this
ID 07 08 09 10 year balance
abc 0 0 0 0 09 2123.00
efg 0 0 0 0 09 780.4
xyz 0 0 0 0 07 2402.9
prq 0 0 0 0 10 123.3
mno 0 0 0 0 07 679
I need to fill columns 07, 08, 09 and 10 based on column "year" and values in balance. For each ID, the column corresponding to value in column year is filled with value in balance. Apply this row-by-row.
For example, for 1st row, year is 09, so column 09 for that ID is filled with value 2123.00. Rest all year values remain 0.
For 3rd row, value of 24502.9 is filled in column 07 since its year value is 07. And so on row by row..
I output should come like this
ID 07 08 09 10 year balance
abc 0 0 2123.00 0 09 2123.00
efg 0 0 780.4 0 09 780.4
xyz 2402.9 0 0 0 07 2402.9
prq 0 0 0 123.3 10 123.3
mno 679 0 0 0 07 679
PS: I have already written a for loop for this. I need something faster than for loop. I'm actually dealing with thousands of data. I dont know if there is anything similar to apply that returns a data-frame
Upvotes: 0
Views: 60
Reputation: 450
You can use the data.table
and reshape2
packages to do this.
You could use ID, year, balance columns of your data.frame and dcast with ID as the row and year in the columns and balance in the value
library(reshape2)
library(data.table)
final_output<-dcast(setDT(df),ID~year, value.var="balance")
If you want to re-order your columns you could use the snippet from the reference below: Reordering dcast data frame
final_output<-dcast(setDT(df),ID~reorder(year,year), value.var="balance")
Upvotes: 1
Reputation: 72613
I'm sure you want this
do.call(rbind, lapply(1:nrow(df1), function(i) {
df1[i, df1[i, 6]] <- df1[i, 7]
df1[i, ]
}))
Yields
ID 07 08 09 10 year balance
1 abc 0.0 0 2123.0 0.0 09 2123.0
2 efg 0.0 0 780.4 0.0 09 780.4
3 xyz 2402.9 0 0.0 0.0 07 2402.9
4 prq 0.0 0 0.0 123.3 10 123.3
5 mno 679.0 0 0.0 0.0 07 679.0
Data
df1 <- structure(list(ID = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("abc",
"efg", "mno", "prq", "xyz"), class = "factor"), `07` = c(0L,
0L, 0L, 0L, 0L), `08` = c(0L, 0L, 0L, 0L, 0L), `09` = c(0L, 0L,
0L, 0L, 0L), `10` = c(0L, 0L, 0L, 0L, 0L), year = c("09", "09",
"07", "10", "07"), balance = c(2123, 780.4, 2402.9, 123.3, 679
)), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 1
Reputation: 742
Essentially what you want to do is to convert the right side of the data frame from long format to wide format. You can do this using the spread
function in tidyr
.
library(tidyr)
library(dplyr)
D <- read.table(header=TRUE, text="
ID 07 08 09 10 year balance
abc 0 0 0 0 09 2123.00
efg 0 0 0 0 09 780.4
xyz 0 0 0 0 07 24502.9
prq 0 0 0 0 10 123.3
mno 0 0 0 0 07 679")
D %>%
select(ID, year, balance) %>%
spread(year, balance, fill=0) %>%
bind_cols(D[,c("year","balance")])
#> ID 7 9 10 year balance
#> 1 abc 0.0 2123.0 0.0 9 2123.0
#> 2 efg 0.0 780.4 0.0 9 780.4
#> 3 mno 679.0 0.0 0.0 7 24502.9
#> 4 prq 0.0 0.0 123.3 10 123.3
#> 5 xyz 24502.9 0.0 0.0 7 679.0
Note: the year 08 is missing from the output because it is missing in your example data.
Upvotes: 3
Reputation: 1718
you can use 4 lines:
df$`07` <- ifelse(test = df$year=='07',yes = df$balance, no=0)
df$`08` <- ifelse(test = df$year=='08',yes = df$balance, no=0)
df$`09` <- ifelse(test = df$year=='09',yes = df$balance, no=0)
df$`10` <- ifelse(test = df$year=='10',yes = df$balance, no=0)
I think it'll work superfast comparing to a loop
Upvotes: 0