Rikin
Rikin

Reputation: 275

How to find correlation in a data set

I wish to find the correlation of the trip duration and age from the below data set. I am applying the function cor(age,df$tripduration). However, it is giving me the output NA. Could you please let me know how do I work on the correlation? I found the "age" by the following syntax:

age <- (2017-as.numeric(df$birth.year)) 

and tripduration(seconds) as df$tripduration.

Below is the data. the number 1 in gender means male and 2 means female.

tripduration    birth year  gender
439              1980        1
186              1984        1
442              1969        1
170              1986        1
189              1990        1
494              1984        1
152              1972        1
537              1994        1
509              1994        1
157              1985        2
1080             1976        2
239              1976        2
344              1992        2

Upvotes: 0

Views: 187

Answers (1)

Jim O.
Jim O.

Reputation: 1111

I think you are trying to subtract a number by a data frame, so it would not work. This worked for me:

birth <- df$birth.year
year <- 2017
age <- year - birth
cor(df$tripduration, age)
>[1] 0.08366848

# To check coefficient
cor(dat$tripduration, dat$birth.year)
>[1] -0.08366848

By the way, please format the question with an easily replicable data where people can just copy and paste to their R. This actually helps you in finding an answer.


Based on the OP's comment, here is a new suggestion. Try deleting the rows with NA before performing a correlation test.

df <- df[complete.cases(df), ]
age <- (2017-as.numeric(df$birth.year)) 
cor(age, df$tripduration)
>[1] 0.1726607

Upvotes: 1

Related Questions