ZhenyABB
ZhenyABB

Reputation: 17

Date coming through as Character, need to fix in R

I have imported a .csv file into R. The files has several columns (I simplified to 4), and two of these columns--assigned and completed--should be dates, however, they are coming in as "character". I need them to be read as dates.

I have spent several hours searching and trying different things but cannot not find a solution. This is what the data looks like (first 3 rows, I have 5K rows total):

       id assigned completed score
1:     54 11/10/16  11/10/16     0
2:     54 11/21/16  11/21/16     7
3:     54  1/26/17   1/26/17    11


> summary(data_subset)
       id        assigned          completed        
 Min.   :   54   Length:5991        Length:5991       
 1st Qu.: 1375   Class :character   Class :character  
 Median : 1910   Mode  :character   Mode  :character  
 Mean   : 2145                                        
 3rd Qu.: 2199                                        
 Max.   :10410                                        

     score      
 Min.   : 0.00  
 1st Qu.: 4.00  
 Median : 7.00  
 Mean   : 8.33  
 3rd Qu.:12.00  
 Max.   :27.00  
 NA's   :1   

I tried lubridate on the assigned column but it overwrote all the values to NA.

library(lubridate)
data_subset$assigned <- mdy(data_subset$assigned)


       id assigned completed score
1:     54     <NA>  11/10/16     0
2:     54     <NA>  11/21/16     7
3:     54     <NA>   1/26/17    11

I am looking for a way to make assigned and completed be read as dates--whether it happens during the .csv import, or through data manipulation after it's already in R.

Upvotes: 0

Views: 685

Answers (1)

Jeff Parker
Jeff Parker

Reputation: 1979

Manipulation after importing approach:

data_subset$assigned <- as.Date(data_subset$assigned,'%m/%d/%y') # This uses base R
data_subset$completed <- as.Date(data_subset$completed,'%m/%d/%y') # The '%/m/%d/%y' specifies the format of your date

Sidenote: I have been working on a similar problem and lubridate has been doing weird things lately. I suspect the reason may be in part to the version of R. lubridate seems to work better on R 3.3.3 than on r-microsoft 3.3.3. I have had certain functions from the package missing on the r-mircosoft distribution. Perhaps some underlying function is missing which is causing everything to go to NA. Again this is just speculation, but maybe it leads to an answer.

Upvotes: 2

Related Questions