Reputation: 613
My CSV files are like as follows -
Data1.csv
BusinessNeedParent,BusinessNeedChild,Identifier
a1,b1,45
a2,b2,60
a3,b3,56
Data2.csv
AdvertiserName,BusinessNeedNumber,State,City
worker,45,Calif,Los angeles
workplace,45,Calif,San Diego
platoon,60,Connec,Bridgeport
teracota,56,New York,Albany
My desired output:
AdvertiserName,BusinessNeedParent,BusinessNeedChild,State,City
worker,a1,b1,Calif,Los angeles
workplace,a1,b1,Calif,San Diego
platoon,a2,b2,Connec,Bridgeport
teracota,a3,b3,New York,Albany
So it has to match the Identifier with the BusinessNeedNumber and generate the data above CSV file. So far my code is like
record <- read.csv("Data1.csv",header=TRUE)
businessneedinformation <- read.csv("Data2.csv",header=TRUE)
for(i in record$BusinessNeedNumber){
if(i %in% businessneedinformation$Identifier){
keyword <- "NA"
busparent <- businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)]
buschild <- businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)]
replacementbusparent <- gsub(pattern=",",replacement="",x=busparent)
replacementbuschild <- gsub(pattern=",",replacement="",x=buschild)
campname <- paste("cat","|","bus","|","en-us","|",(tolower(as.character(replacementbusparent[1]))),"|",(tolower(as.character(replacementbuschild[1]))),sep="")
thislist <- data.frame(Keyword = keyword,BusinessNeedParent = businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)],BusinessNeedChild = businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)],Campaign=campname)
}
List <- rbind(List, thislist)
}
As I am using a for loop, it is very slow, for almost 100000 entries it is taking a long time, what is the way to achieve it faster using indexing in R.
Upvotes: 0
Views: 66
Reputation: 4335
> zz <- "BusinessNeedParent,BusinessNeedChild,Identifier
a1,b1,45
a2,b2,60
a3,b3,56"
> Data <- read.table(text=zz, header = TRUE,sep=',')
> Data
BusinessNeedParent BusinessNeedChild Identifier
1 a1 b1 45
2 a2 b2 60
3 a3 b3 56
> zz1 <- "AdvertiserName,BusinessNeedNumber,State,City
worker,45,Calif,Los angeles
workplace,45,Calif,San Diego
platoon,60,Connec,Bridgeport
teracota,56,New York,Albany"
> Data1 <- read.table(text=zz1, header = TRUE,sep=',')
> Data1
AdvertiserName BusinessNeedNumber State City
1 worker 45 Calif Los angeles
2 workplace 45 Calif San Diego
3 platoon 60 Connec Bridgeport
4 teracota 56 New York Albany
> m <- merge(Data,Data1,by.x="Identifier",by.y="BusinessNeedNumber")
> m[,c(4,2,3,5,6)]
AdvertiserName BusinessNeedParent BusinessNeedChild State City
1 worker a1 b1 Calif Los angeles
2 workplace a1 b1 Calif San Diego
3 teracota a3 b3 New York Albany
4 platoon a2 b2 Connec Bridgeport
write.csv(m, file = "demoMerge.csv")
or you can use
m1 <- Reduce(function(old, new) { merge(old, new, by.x='Identifier', by.y='BusinessNeedNumber') }, list_of_files)
> m1
Identifier BusinessNeedParent BusinessNeedChild AdvertiserName State City
1 45 a1 b1 worker Calif Los abngles
2 45 a1 b1 workplace Calif San Diego
3 56 a3 b3 teracota New York Albany
4 60 a2 b2 platoon Connec Bridgeport
Upvotes: 1