H Park
H Park

Reputation: 223

Creating a new variable

Hi I have a dataset that looks like this

bankname    bankid  year    totass  invloc1 invamt1 invloc2 invamt2 invloc3 invamt3   
Bank A  1   1881    244789  Philadelphia    7250.32 New York    20218.20    Philadelphia    29513.4   
Bank B  2   1881    195755  Pittsburgh  10243.60    NA  1851.51 NA  NA   
Bank C  3   1881    107736  New York    13357.80    Wilkes-Barre    17761.20    NA  NA   
Bank D  4   1881    170600  Philadelphia    3.35    Philadelphia    2.00    NA  NA   
Bank E  5   1881    32000000    New York    351266.00   New York    314012.00       NA   

but I want to create a new variable called NY_tot using invloc and invamt variables for each bank. For each bank, if their invloc is New York, then sum invamt. invloc1 and invamt1 go together. Hence, I want this dataset to look like this.

bankname    bankid  year    totass  invloc1 invamt1 invloc2 invamt2 invloc3 invamt3 NY_tot   
Bank A  1   1881    244789  Philadelphia    7250.32 New York    20218.20    Philadelphia    29513.4 20218.20   
Bank B  2   1881    195755  Pittsburgh  10243.60    NA  1851.51 NA  NA  0   
Bank C  3   1881    107736  New York    13357.80    Wilkes-Barre    17761.20    NA  NA  13357.80   
Bank D  4   1881    170600  Philadelphia    3.35    Philadelphia    2.00    NA  NA  0   
Bank E  5   1881    32000000    New York    351266.00   New York    314012.00       NA  665278   

Here is the dataset I am using

bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c( 1, 2,  3,  4,  5)
year<- c( 1881, 1881,   1881,   1881,   1881)
totass  <- c(244789,    195755, 107736, 170600, 32000000)
invloc1 <-c("Philadelphia","Pittsburgh","New York","Philadelphia","New York")
invamt1<-c(7250.32,10243.6,13357.8,3.35,351266)
invloc2<-c("New York","NA","Wilkes-Barre","Philadelphia","New York")
invamt2<-c(20218.2,1851.51,17761.2,2,314012)
invloc3<-c("Philadelphia","NA","NA","NA","")
invamt3<-c(29513.4,NA,NA,NA,NA)
bankdata<-data.frame(bankname, bankid,year,totass, invloc1, invamt1, invloc2, invamt2, invloc3, invamt3)

When I tried the following code:

Change factor variables (invloc) to character

i <- sapply(bankdata, is.factor)
bankdata[i] <- lapply(bankdata[i], as.character)  

Then create a new variable

for(i in 1:nrow(bankdata)){
 bankdata$NY_tot<-0
for(j in 1:3){
if((!is.na(bankdata[i,paste("invloc",j,sep="")])) && (bankdata[i,paste("invloc",j,sep="")]=="New York")){
  if (!is.na(bankdata[i,paste("invamt",j,sep="")])){
    bankdata$NY_tot[i]<-bankdata$NY_tot[i]+bankdata[i,paste("invamt",j,sep="")]
      }
    }
  }
}

I get 0s in my NY_tot variable. Can you tell me why?

Thank you in advance!

Upvotes: 0

Views: 129

Answers (1)

Joe
Joe

Reputation: 3991

As others have said in comments, you do not need to mess around with for loops to do this. R has a ton of fancy built-in functions to quickly handle problems of this sort.

In this case, your solution is ifelse. I'm somewhat confused as to which columns you want to work with, but try something like this:

bankdata$NY_tot=ifelse(bankdata$invloc1=="New York",sum(bankdata$invamt1,bankdata$invamt2),NA)

What's going on here? ifelse works like this:

ifelse(conition, value_if_true, value_if_false)

So, in your case the function checks whether the value of invloc1 is "New York", returns a sum if it is and NA if it is not. The nicest part is that it automatically does this row-by-row, so you don't need to manually iterate over the dataframe, which is what is causing problems in your code above.

Edit: as suggested by @Richard Scriven below, you can avoid typing the name of the dataframe four times by using with or within, eg:

bankdata<-within(bankdata, NY_tot=ifelse(invloc1=="New York"),sum(invamt1,invamt2),NA)

This is a nifty trick that I will use for the rest of my life, which basically tells R that all variable names given are associated with bankdata so you don't have to keep typing it out.

Upvotes: 1

Related Questions