Reputation: 101

How can I compare column names of two separate data frames in R?

I have 2 data frames in R with epigenetic data. To use one of them as a train set and the other as a test set in the glmnet package, the column number if them have to match. As both of the data frames contain more than 800000 columns, I'm looking for a way to compare the names columns of the 2 data frames so that I can delete the columns that the two don't have in common. So far I just found packages and functions that compare rows of two data frames with each other. As an example, I'm looking for something like this:

df1
participant_code cg123  cg122  cg121  cg120

df2
participant_code cg123  cg122  cg121  cg119

The function would give me then e.g. a table in which it shows me which colnames differ:

colname 5 differs

Upvotes: 5

Answers (4)

Haraldur Karlsson

Reputation: 11

You could try using the inspectdf package. There is also comparedf in the arsenal package.

Upvotes: 0

stats_guy

Reputation: 717

Your are looking for the intersection of column names of two data frames. You can simply use the command intersect to achieve what you want. First you extract the names of both data frames. Then you useintersect. The result of intersect contains the column names that are in either of the two data frames. Use this object to subset of initial data frames and you're done.

# define data frames with dummy data
df1 <- data.frame(participant_code = 1,
                  cg123            = 2,
                  cg122            = 3, 
                  cg121            = 4,
                  cg120            = 5)

df2 <- data.frame(participant_code = 6,
                  cg123            = 7,
                  cg122            = 8, 
                  cg121            = 9,
                  cg119            = 10)

# extract column names of the data frames
cols_df_1 <- names(df1)
cols_df_2 <- names(df2)

# find the intersection of both column name vectors
cols_intersection <- intersect(cols_df_1, cols_df_2)

# subset the initial data frames
df1_sub <- df1[,cols_intersection]
df2_sub <- df2[,cols_intersection]

# print to console and see result
df1_sub
#participant_code cg123 cg122 cg121
#               1     2     3     4

df2_sub
#participant_code cg123 cg122 cg121
#               6     7     8     9

Upvotes: 5

user10917479

Reputation:

This might not work the best for a huge data frame, but I have recently become a fan of compare() from the new waldo package.

This will show an output of differences between the two. Again, might be indecipherable for 800k length vectors, but I thought it was worth pointing out.

library(waldo)

compare(names(df1), names(df2)

Upvotes: 2

Ronak Shah

Reputation: 389325

You can use intersect to get common columns from both the dataframes.

get_common_cols <- function(df1, df2)  intersect(names(df1), names(df2))

You can pass both the dataframe in a function to get similar columns and use it to subset the dataframes

common_cols <- get_common_cols(data1, data2)
data1 <- data1[, common_cols]
data2 <- data2[, common_cols]

Upvotes: 3

How can I compare column names of two separate data frames in R?

Answers (4)

Related Questions