can I select some rows in my data set whose have the same value in 2 of the columns?

Question

I have a data set with 40 columns and 2000 rows. the value of 2 columns are important. I want to select rows whose have the same value in these 2 columns. a small sample of my data is like this

2 3 4 5 6 3 23 32
4 3 4 1 0 5 6  43
4 4 3 22 1  2  23

Suppose I want to select rows whose have same value in first and third columns. So I want the second row to be stored in a new data set

eastclintw00d · Accepted Answer

I take from your comments that you have numbers stored as factors in that dataframe. Factors have different internal values. So when the console output shows the factor level to be 4 it is not necessarily a 4 in the internal representation. In general, two different factors are not compatible with each other except if they have the same level set. To see the 'internal representation' of your first column use as.numeric(df[[1]]).

Now to the solution of your problem. You first have to convert the factors in your columns 1 and 3 (or all columns) into numeric values using the factor levels. Instructions for that can be found here.

## converting factor levels to numeric values
df[[1]] <- as.numeric(levels(df[[1]]))[df[[1]]]
df[[3]] <- as.numeric(levels(df[[3]]))[df[[3]]]

## filter data
df[df[1] == df[3],]

can I select some rows in my data set whose have the same value in 2 of the columns?

Answers (1)

Related Questions