tnerbusas
tnerbusas

Reputation: 91

Writing nested for loop to concatenate rows that share a key in a dataframe in R

So, I have a key dataframe of IDs

IDs <- data.frame(c(123,456,789))

I also have a dataframe of split SQL queries that need to be concatenated (there was an issue of the queries truncating due to their length, so I had to split them into pieces)

splitQueriesdf <- data.frame(ID = c(123,123,123,456,456,456,789,789,789), SplitQUery = c("SELECT", "* FROM", "tablename1","SELECT", "* FROM", "tablename2","SELECT", "* FROM", "tablename3"))

I need to write a loop that concatenates the queries by the IDs that are present in the IDs dataframe into a 3rd dataframe. The nrows(IDs) will vary, so I need that too be dynamic as well

So I need the 3rd dataframe to look like:

    ID  FullQuery
 1  123 SELECT * FROM tablename1
 2  456 SELECT * FROM tablename2
 3  789 SELECT * FROM tablename3

I have an idea that I need a loop that goes through the length of IDs -- so 3 times, and a nested loop that appends the correct rows together, but I'm fairly new to R, and I'm getting stuck. Here's what I have so far:

dataframe3= NULL
for (index in 1:nrow(IDs)){
   for (index2 in 1:nrow(splitQueriesdf)){ 
     dataframe3[index] <- rbind(splitQueriesdf[index2,4])
  }
}

Any help is much appreciated!

Upvotes: 2

Views: 149

Answers (3)

Nicolas2
Nicolas2

Reputation: 2210

With tidyverse:

splitQueriesdf %>% group_by(ID) %>% summarise(query=paste(SplitQUery,collapse=" "))
## A tibble: 3 x 2
#     ID query                   
#  <dbl> <chr>                   
#1   123 SELECT * FROM tablename1
#2   456 SELECT * FROM tablename2
#3   789 SELECT * FROM tablename3

Upvotes: 1

User2321
User2321

Reputation: 3062

Using the data table package you can do:

library(data.table)
IDs <- data.frame(ID = c(123,456,789))
splitQueriesdf <- data.frame(ID = c(123,123,123,456,456,456,789,789,789), SplitQUery = c("SELECT", "* FROM", "tablename1","SELECT", "* FROM", "tablename2","SELECT", "* FROM", "tablename3"))

setDT(splitQueriesdf)
splitQueriesdf[ID %in% IDs$ID, paste(SplitQUery, collapse = " "), by = .(ID)]

   ID                FullQuery
1: 123 SELECT * FROM tablename1
2: 456 SELECT * FROM tablename2
3: 789 SELECT * FROM tablename3

Upvotes: 1

akrun
akrun

Reputation: 886948

One option is aggregate from base R to group by 'ID' and then paste the 'SplitQUery' column

splitQueriesdf$SplitQUery <- as.character(splitQueriesdf$SplitQUery)
aggregate(cbind(FullQuery = SplitQUery) ~ ID, splitQueriesdf,
          FUN = paste, collapse = ' ')
#  ID                FullQuery
#1 123 SELECT * FROM tablename1
#2 456 SELECT * FROM tablename2
#3 789 SELECT * FROM tablename3

Upvotes: 3

Related Questions