Date coming through as Character, need to fix in R

Question

I have imported a .csv file into R. The files has several columns (I simplified to 4), and two of these columns--assigned and completed--should be dates, however, they are coming in as "character". I need them to be read as dates.

I have spent several hours searching and trying different things but cannot not find a solution. This is what the data looks like (first 3 rows, I have 5K rows total):

       id assigned completed score
1:     54 11/10/16  11/10/16     0
2:     54 11/21/16  11/21/16     7
3:     54  1/26/17   1/26/17    11


> summary(data_subset)
       id        assigned          completed        
 Min.   :   54   Length:5991        Length:5991       
 1st Qu.: 1375   Class :character   Class :character  
 Median : 1910   Mode  :character   Mode  :character  
 Mean   : 2145                                        
 3rd Qu.: 2199                                        
 Max.   :10410                                        

     score      
 Min.   : 0.00  
 1st Qu.: 4.00  
 Median : 7.00  
 Mean   : 8.33  
 3rd Qu.:12.00  
 Max.   :27.00  
 NA's   :1

I tried lubridate on the assigned column but it overwrote all the values to NA.

library(lubridate)
data_subset$assigned <- mdy(data_subset$assigned)


       id assigned completed score
1:     54       11/10/16     0
2:     54       11/21/16     7
3:     54        1/26/17    11

I am looking for a way to make assigned and completed be read as dates--whether it happens during the .csv import, or through data manipulation after it's already in R.

Jeff Parker · Accepted Answer

Manipulation after importing approach:

data_subset$assigned <- as.Date(data_subset$assigned,'%m/%d/%y') # This uses base R
data_subset$completed <- as.Date(data_subset$completed,'%m/%d/%y') # The '%/m/%d/%y' specifies the format of your date

Sidenote: I have been working on a similar problem and lubridate has been doing weird things lately. I suspect the reason may be in part to the version of R. lubridate seems to work better on R 3.3.3 than on r-microsoft 3.3.3. I have had certain functions from the package missing on the r-mircosoft distribution. Perhaps some underlying function is missing which is causing everything to go to NA. Again this is just speculation, but maybe it leads to an answer.

Date coming through as Character, need to fix in R

Answers (1)

Related Questions