Can't import this excel file into R

Question

I'm having trouble importing a file into R. The file was obtained from this website: https://report.nih.gov/award/index.cfm, where I clicked "Import Table" and downloaded a .xls file for the year 1992.

This image might help describe how I retrieved the data

Here's what I've tried typing into the console, along with the results:

Input:

> library('readxl')
> data1992 <- read_excel("1992.xls")

Output:

Not an excel file
Error in eval(substitute(expr), envir, enclos) : 
  Failed to open /home/chrx/Documents/NIH Funding Awards, 1992 - 2016/1992.xls

Input:

> data1992 <- read.csv ("1992.xls", sep ="	")

Output:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names

I'm not sure whether or not this is relevant, but I'm using GalliumOS (linux). Because I'm using Linux, Excel isn't installed on my computer. LibreOffice is.

C8H10N4O2 · Accepted Answer

Why bother with getting the data in and out of a .csv if it's right there on the web page for you to scrape?

# note the query parameters in the url when you apply a filter, e.g. fy=
url <- 'http://report.nih.gov/award/index.cfm?fy=1992'

library('rvest')
library('magrittr')
library('dplyr')
df <- url %>%
        read_html() %>%
        html_nodes(xpath='//*[@id="orgtable"]') %>%
        html_table()%>% 
        extract2(1) %>%
        mutate(Funding = as.numeric(gsub('[^0-9.]','',Funding)))

head(df)

returns

                              Organization          City State       Country Awards Funding
1 A.T. STILL UNIVERSITY OF HEALTH SCIENCES    KIRKSVILLE    MO UNITED STATES      3  356221
2                     AAC ASSOCIATES, INC.        VIENNA    VA UNITED STATES     10 1097158
3       AARON DIAMOND AIDS RESEARCH CENTER      NEW YORK    NY UNITED STATES      3  629946
4                      ABBOTT LABORATORIES NORTH CHICAGO    IL UNITED STATES      4 1757241
5                            ABIOMED, INC.       DANVERS    MA UNITED STATES      6 2161146
6                     ABRATECH CORPORATION     SAUSALITO    CA UNITED STATES      1  450411

If you need to loop through years 1992 to present, or something similar, this programmatic approach will save you a lot of time versus handling a bunch of flat files.

Can't import this excel file into R

Answers (2)

Related Questions

Can&#39;t import this excel file into R

Answers (2)

Related Questions

Can't import this excel file into R