user3188390
user3188390

Reputation: 613

How to avoid 'for loop' to work on a CSV file faster in R?

My CSV files are like as follows -

Data1.csv

BusinessNeedParent,BusinessNeedChild,Identifier
a1,b1,45
a2,b2,60
a3,b3,56

Data2.csv

AdvertiserName,BusinessNeedNumber,State,City
worker,45,Calif,Los angeles
workplace,45,Calif,San Diego
platoon,60,Connec,Bridgeport
teracota,56,New York,Albany

My desired output:

 AdvertiserName,BusinessNeedParent,BusinessNeedChild,State,City
 worker,a1,b1,Calif,Los angeles
 workplace,a1,b1,Calif,San Diego
 platoon,a2,b2,Connec,Bridgeport
 teracota,a3,b3,New York,Albany

So it has to match the Identifier with the BusinessNeedNumber and generate the data above CSV file. So far my code is like

record <- read.csv("Data1.csv",header=TRUE)
businessneedinformation <- read.csv("Data2.csv",header=TRUE)

for(i in record$BusinessNeedNumber){
  if(i %in% businessneedinformation$Identifier){ 
   keyword <- "NA"
  busparent <- businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)]
    buschild <- businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)]
   replacementbusparent <- gsub(pattern=",",replacement="",x=busparent)
   replacementbuschild <- gsub(pattern=",",replacement="",x=buschild)
   campname <- paste("cat","|","bus","|","en-us","|",(tolower(as.character(replacementbusparent[1]))),"|",(tolower(as.character(replacementbuschild[1]))),sep="")
   thislist <- data.frame(Keyword = keyword,BusinessNeedParent = businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)],BusinessNeedChild = businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)],Campaign=campname)
  }
 List <- rbind(List, thislist) 
 }

As I am using a for loop, it is very slow, for almost 100000 entries it is taking a long time, what is the way to achieve it faster using indexing in R.

Upvotes: 0

Views: 66

Answers (1)

Prasanna Nandakumar
Prasanna Nandakumar

Reputation: 4335

> zz <- "BusinessNeedParent,BusinessNeedChild,Identifier
a1,b1,45
a2,b2,60
a3,b3,56"
> Data <- read.table(text=zz, header = TRUE,sep=',')
> Data
  BusinessNeedParent BusinessNeedChild Identifier
1                 a1                b1         45
2                 a2                b2         60
3                 a3                b3         56
> zz1 <- "AdvertiserName,BusinessNeedNumber,State,City
worker,45,Calif,Los angeles
workplace,45,Calif,San Diego
platoon,60,Connec,Bridgeport
teracota,56,New York,Albany"
> Data1 <- read.table(text=zz1, header = TRUE,sep=',')
> Data1
  AdvertiserName BusinessNeedNumber    State        City
1         worker                 45    Calif Los angeles
2      workplace                 45    Calif   San Diego
3        platoon                 60   Connec  Bridgeport
4       teracota                 56 New York      Albany
> m <- merge(Data,Data1,by.x="Identifier",by.y="BusinessNeedNumber")
> m[,c(4,2,3,5,6)]
  AdvertiserName BusinessNeedParent BusinessNeedChild    State        City
1         worker                 a1                b1    Calif Los angeles
2      workplace                 a1                b1    Calif   San Diego
3       teracota                 a3                b3 New York      Albany
4        platoon                 a2                b2   Connec  Bridgeport
write.csv(m, file = "demoMerge.csv")  

or you can use

m1 <- Reduce(function(old, new) { merge(old, new, by.x='Identifier', by.y='BusinessNeedNumber') }, list_of_files)
> m1
  Identifier BusinessNeedParent BusinessNeedChild AdvertiserName    State        City
1         45                 a1                b1         worker    Calif Los abngles
2         45                 a1                b1      workplace    Calif   San Diego
3         56                 a3                b3       teracota New York      Albany
4         60                 a2                b2        platoon   Connec  Bridgeport

Upvotes: 1

Related Questions