Reputation: 223
Hi I have a dataset that looks like this
bankname bankid year totass invloc1 invamt1 invloc2 invamt2 invloc3 invamt3
Bank A 1 1881 244789 Philadelphia 7250.32 New York 20218.20 Philadelphia 29513.4
Bank B 2 1881 195755 Pittsburgh 10243.60 NA 1851.51 NA NA
Bank C 3 1881 107736 New York 13357.80 Wilkes-Barre 17761.20 NA NA
Bank D 4 1881 170600 Philadelphia 3.35 Philadelphia 2.00 NA NA
Bank E 5 1881 32000000 New York 351266.00 New York 314012.00 NA
but I want to create a new variable called NY_tot
using invloc and invamt variables for each bank. For each bank, if their invloc is New York, then sum invamt. invloc1 and invamt1 go together. Hence, I want this dataset to look like this.
bankname bankid year totass invloc1 invamt1 invloc2 invamt2 invloc3 invamt3 NY_tot
Bank A 1 1881 244789 Philadelphia 7250.32 New York 20218.20 Philadelphia 29513.4 20218.20
Bank B 2 1881 195755 Pittsburgh 10243.60 NA 1851.51 NA NA 0
Bank C 3 1881 107736 New York 13357.80 Wilkes-Barre 17761.20 NA NA 13357.80
Bank D 4 1881 170600 Philadelphia 3.35 Philadelphia 2.00 NA NA 0
Bank E 5 1881 32000000 New York 351266.00 New York 314012.00 NA 665278
Here is the dataset I am using
bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c( 1, 2, 3, 4, 5)
year<- c( 1881, 1881, 1881, 1881, 1881)
totass <- c(244789, 195755, 107736, 170600, 32000000)
invloc1 <-c("Philadelphia","Pittsburgh","New York","Philadelphia","New York")
invamt1<-c(7250.32,10243.6,13357.8,3.35,351266)
invloc2<-c("New York","NA","Wilkes-Barre","Philadelphia","New York")
invamt2<-c(20218.2,1851.51,17761.2,2,314012)
invloc3<-c("Philadelphia","NA","NA","NA","")
invamt3<-c(29513.4,NA,NA,NA,NA)
bankdata<-data.frame(bankname, bankid,year,totass, invloc1, invamt1, invloc2, invamt2, invloc3, invamt3)
When I tried the following code:
Change factor variables (invloc) to character
i <- sapply(bankdata, is.factor)
bankdata[i] <- lapply(bankdata[i], as.character)
Then create a new variable
for(i in 1:nrow(bankdata)){
bankdata$NY_tot<-0
for(j in 1:3){
if((!is.na(bankdata[i,paste("invloc",j,sep="")])) && (bankdata[i,paste("invloc",j,sep="")]=="New York")){
if (!is.na(bankdata[i,paste("invamt",j,sep="")])){
bankdata$NY_tot[i]<-bankdata$NY_tot[i]+bankdata[i,paste("invamt",j,sep="")]
}
}
}
}
I get 0s in my NY_tot
variable. Can you tell me why?
Thank you in advance!
Upvotes: 0
Views: 129
Reputation: 3991
As others have said in comments, you do not need to mess around with for loops to do this. R has a ton of fancy built-in functions to quickly handle problems of this sort.
In this case, your solution is ifelse
. I'm somewhat confused as to which columns you want to work with, but try something like this:
bankdata$NY_tot=ifelse(bankdata$invloc1=="New York",sum(bankdata$invamt1,bankdata$invamt2),NA)
What's going on here? ifelse
works like this:
ifelse(conition, value_if_true, value_if_false)
So, in your case the function checks whether the value of invloc1
is "New York"
, returns a sum if it is and NA
if it is not. The nicest part is that it automatically does this row-by-row, so you don't need to manually iterate over the dataframe, which is what is causing problems in your code above.
Edit: as suggested by @Richard Scriven below, you can avoid typing the name of the dataframe four times by using with
or within
, eg:
bankdata<-within(bankdata, NY_tot=ifelse(invloc1=="New York"),sum(invamt1,invamt2),NA)
This is a nifty trick that I will use for the rest of my life, which basically tells R that all variable names given are associated with bankdata
so you don't have to keep typing it out.
Upvotes: 1