Thomas Philips
Thomas Philips

Reputation: 1089

R read_excel reads numeric data incorrectly

I'm trying to download and parse the Data worksheet in the file ie_data.xls from Professor Robert Shiller's home page (http://www.econ.yale.edu/~shiller/data.htm). I download the file from http://www.econ.yale.edu/~shiller/data/ie_data.xls, and then run the following script:

library(tidyverse)
ie_data <- read_excel("ie_data.xls", sheet = "Data", col_names = TRUE, 
                       col_types = "numeric", na = "", skip = 7) %>% 
           select(Date,E) %>%
           drop_na()

A bunch of warnings are generated, but more bothersome is the output

> names(ie_data)
[1] "Date" "E"   
> ie_data
# A tibble: 1,791 x 2
    Date     E
   <dbl> <dbl>
 1 1871.   0.4
 2 1871.   0.4
 3 1871.   0.4
 4 1871.   0.4
 5 1871.   0.4
 6 1871.   0.4
 7 1871.   0.4
 8 1871.   0.4
 9 1871.   0.4
10 1871.   0.4
# ... with 1,781 more rows
Warning message:
`...` is not empty.

We detected these problematic arguments:
* `needs_dots`

These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument? 

The contents of both columns should have two decimal places (1871.01 represents January 1871, 1871.02 represents February 1871 and so on, and the second column is earnings per share rounded to the nearest penny), but everything after the decimal point is gone in the first column at the head of the dataframe! Even more mysterious is its tail:

> tail(ie_data)
# A tibble: 6 x 2
   Date     E
  <dbl> <dbl>
1 2019.  135.
2 2019.  137.
3 2019.  139.
4 2020.  132.
5 2020.  124.
6 2020.  116.
Warning message:
`...` is not empty.

We detected these problematic arguments:
* `needs_dots`

These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?  

Now both columns have lost their fractional part! What change do I need to make to my code in order to read these columns correctly?

Sincerely and with many thanks in advance

Thomas Philips

Upvotes: 0

Views: 1034

Answers (2)

dspn
dspn

Reputation: 314

try it with col_types = "text" Don't really know why numeric will get you trimmed numbers but i seem to get it working with text (provided you later convert to a rounded number)

Upvotes: 1

Matt
Matt

Reputation: 7385

You can do the following to see more significant digits in your console when printing your data with ie_data. This doesn't affect your data, only the way it is shown when printed to your console.

options(pillar.sigfig = 10)
ie_data

Which will show:

  Date     E
     <dbl> <dbl>
 1 1871.01   0.4
 2 1871.02   0.4
 3 1871.03   0.4
 4 1871.04   0.4
 5 1871.05   0.4
 6 1871.06   0.4
 7 1871.07   0.4
 8 1871.08   0.4
 9 1871.09   0.4
10 1871.1    0.4
# ... with 1,781 more rows

If you use the following:

options(pillar.sigfig = 1)
ie_data

You will get:

# A tibble: 1,791 x 2
    Date     E
   <dbl> <dbl>
 1 1871.   0.4
 2 1871.   0.4
 3 1871.   0.4
 4 1871.   0.4
 5 1871.   0.4
 6 1871.   0.4
 7 1871.   0.4
 8 1871.   0.4
 9 1871.   0.4
10 1871.   0.4
# ... with 1,781 more rows

Upvotes: 1

Related Questions