Pawan Rama Mali
Pawan Rama Mali

Reputation: 551

Error: Unable to load time variables with missing values in python using pyreadr package from RData file

I want to execute some python functions using data from '.RData' file. I am using the 'pyreadr' python package for the same.

Here is example of R Code

library(data.table)

# Creating demo data frame
data <- data.table(x_time = c(Sys.time(),Sys.time()+1,Sys.time()+2))
data_missing <- data.table(x_time = c(Sys.time(),NA,NA))

# checking the classes
sapply(data,class)
sapply(data_missing,class)

# Storing the data in RData file 
save(data, file = "test_data.RData")
save(data_missing, file = "test_missing_data.RData")

The reason I am storing it in different files is because the 'test_data.RData' is successfully loaded in python, however the 'test_missing_data.RData' is giving the an error.

Here is the Python Code

##  Working demo
# import pyreadr
# result=pyreadr.read_r('test_data.RData')
# data=result['data']
# data.dtypes
# print(data)

### Error in below 

import pyreadr
result=pyreadr.read_r('test_missing_data.RData') # Error 
data=result['data']
data.dtypes
print(data)

The error message is as below:

C:\Users\Pawan\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\pandas\core\tools\datetimes.py:530: RuntimeWarning: invalid value encountered in multiply arr, tz_parsed = tslib.array_with_unit_to_datetime(arg, unit, errors=errors)

The error occurs when there are NA values in the data frame. Is there other way load RData files in python ?

Thank you for your time and help.

Upvotes: 0

Views: 282

Answers (1)

Otto Fajardo
Otto Fajardo

Reputation: 3417

It is not an error, it is a warning, meaning it is probably not affecting your results. After running your R code, I can read the RData files without issue, notice that the name of the dataframe, you got it wrong in your code

import pyreadr
result=pyreadr.read_r('test_missing_data.RData') # No error, just warning
# Your data frame is called data_missing, not data, since you called like that in your R code,
# I think this is what you are doing wrong
# Check data.keys() to see what you have if you are not sure
data=result['data_missing']
data.dtypes
#x_time    datetime64[ns]                                                                                                                                                                              
#dtype: object
print(data)
#                       x_time                                                                                                                                                                       
#0 2022-08-03 09:37:55.963370752                                                                                                                                                                       
#1                           NaT                                                                                                                                                                       
#2                           NaT 

# Looks correct to me

Upvotes: 1

Related Questions