Sam S.
Sam S.

Reputation: 803

R read an URL table with a free text column

I would like to read a txt table inside an URL. The table does have 3 columns; the second column is a character column with few words in it with quotations around words. The data cannot be accessible by public, that is why I cannot give the link here, but I give an example of how data look like when you open the http link:

col1  "column second" 
col3
1 "a city name" 2323
20 second 4343
30 "third row" 43434

'col1','"column second"','col3' are column names and this is how the header looks like in the real URL. I tried few read functions such as read_delim(), readline(), read.table and fread, but none of them could read the data correctly. When I download or copy/paste in a file, it works without any problem, but fails when I want to read directly from the URL. The problem is with the "" in the second column. For example, if I set sep=" ", the first row of the data has 5 columns, the second row 3 columns and 3rd row 4 columns.

I appreciate your kind help.

Upvotes: 0

Views: 462

Answers (2)

Sam S.
Sam S.

Reputation: 803

The answer by Grothendieck is perfect. I just found another solution, https://www.r-bloggers.com/getting-data-from-an-online-source/, for those who migh interest in reading url tables.

library(RCurl)
# The url link provided in the comment by Grothendieck
url <- 'https://raw.githubusercontent.com/CSSEGISandData/COVID- 
    19/master/archived_data/archived_daily_case_updates/02-12-2020_1020.csv'
myfile <- getURL(url, ssl.verifyhost=FALSE, ssl.verifypeer=FALSE)
mydat <- read.csv(textConnection(myfile), header=T)
head(mydat)

enter image description here

The problem with my url, in the question post, was that the data was not in a raw format; like a file in onedrive or google drive. It can go in another question; or please welcome to share your answers or link here, to read such type of data.

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269860

Use scan to read in the data into a character vector s and reform all but the first 3 elements into a matrix and then data frame DF using those 3 elements as the column names. Finally convert the types of each column of DF. We have used scan to read from Lines shown in the Note at the end but it can also read from a file or connection using the file= argument of scan. No packages are used.

s <- scan(text = Lines, what = "", quiet = TRUE)
DF <- setNames(as.data.frame(matrix(tail(s, -3),, 3, byrow = TRUE)), s[1:3])
DF[] <- lapply(DF, type.convert)

giving:

> DF
  col1 column second  col3
1    1   a city name  2323
2   20        second  4343
3   30     third row 43434

Note

Input in reproducible form:

Lines <- 'col1  "column second" 
col3
1 "a city name" 2323
20 second 4343
30 "third row" 43434'

Upvotes: 1

Related Questions