Adding elements to a dataframe in R using Rbind

Question

I am creating a dataframe with 3 columns (char, char, int) called Alleles_df from df1 and df2 using:

Alleles_df <- data.frame('refsnp_id'=character(),'allele'=character(), 
   'chrom_start' = integer(),stringsAsFactors = F)

for (i in 1:nrow(df1)){    
   Alleles_df[i,] <- df1[(df1$col1[i]==df2$col1[i]),]
}

for some values of i, I receive the following error:

Error in x[[jj]][iseq] <- vjj : replacement has length zero

This is because the the columns df1 and df2 do not match for certain values of i. How do I bind a row with c("NA","NA",0) in those situations? I would greatly appreciate your assistance!

df1 is data from an online server called biomart. df2 is what I generated manually. Each has 3 columns with Chromosome, Allele, BaseLocation.

    refsnp_id allele chrom_start
1 rs778598915  G/A/T    42693910
2  rs11541159    T/C    42693843
3 rs397514502    G/C    42693321
4 rs762949801    C/T    42693665
5 rs776304817  G/A/T    42693653

Salix · Accepted Answer

explanation The problem is actually in the order of the []. In df1[i,][(df1$col1[i] == df2$col1[i]),], if row i of df1 doesn't have the row with matching col1, you get <0 rows> (or 0-length row.names). But in df1[(df1$col1[i]==df2$col1[i]),][i,] if there's no row of matching col1 in df1, the result is also 0 rows, but then row i of that is , so the result is a data frame of one NA filled row of length 3.

edited explanation Since you edited : The problem is that not every row of df1 will have their col1 matching the col1 of the same row in df2. Hence why you get a 0 rows. Adding [i,]after ( df1[( df1$col1[i] == df2$col1[i] ), ][i, ] ) will still give an empty row of length 3 (NAs) and not stop your loop, but you could aslo just not do a loop (see below).

If you really want to keep your loop, you can get rid of empty rows like Alleles_df <- Alleles_df[-which( rowSums( is.na( Alleles_df ), na.rm = T ) == ncol( Alleles_df ) ), ].

solution But if df1 and df2 have the same numbers of row and all potentially matching alleles are always on the same row in df1 and df2, df1[df1$col1 == df2$col1, ] would get the same results faster.

better solution And if df1 and df2 don't have the same number of rows OR if you'd like to get all the rows with matching alleles even if they aren't necessarily on the same row in your data frame (like if 'rs778598915' on row 1 in df1 could be on row 5 in df2), you can find the row that match and rbind it to Alleles_df without a loop like so :

Alleles_df <- rbind(df[sapply(df$col1, function(x) match(x, df2$col1, nomatch = 0) ),])

Adding elements to a dataframe in R using Rbind

Answers (1)

Related Questions