micturalgia
micturalgia

Reputation: 325

Table scraped from a web page is read as a single character vector: how to convert into a dataframe?

I have scraped a large table from a web page using the rvest package, but it is reading it as a single vector:

foo<-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")

that I need to deal with as a dataframe that looks like this:

bar<-as.data.frame(cbind(Animal=c("Dog","Cat","Goat"),A=c(1,4,7),B=c(2,5,8),C=c(3,6,9)))

This might be a simple dilemma but I'd appreciate the help.

Upvotes: 1

Views: 150

Answers (4)

DJJ
DJJ

Reputation: 2549

Here is a convenient tool to work with list,

 seqList <-
function(character,by= 1,res=list()){
    ### sequence characters by 
    if (length(character)==0){
        res
    } else{
        seqList(character[-c(1:by)],by=by,res=c(res,list(character[1:by])))

    }
    }

Once you convert your characters into lists it's easier to manipulate them for instance you can do.

options(stringsAsFactors=FALSE)

foo <-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")
foo <- c("Animal",foo)

df <- data.frame(t(do.call("rbind",
    lapply(1:4,function(x) do.call("cbind",lapply(seqList(foo,4),"[[",x))))))

colnames(df) <- df[1,]

df <- df[-1,]

## > df
##   Animal A B C
## 2    Dog 1 2 3
## 3    Cat 4 5 6
## 4   Goat 7 8 9

Note: I haven't tested the efficiency of the function. It might not be very efficient for large amount of characters. The use of matrices might a better tool for this job.

Upvotes: 0

Rich Scriven
Rich Scriven

Reputation: 99351

If you want the proper column types, you can try this. Split into a list, name the list, then convert the column types before coercing to data frame.

l <- setNames(split(tail(foo, -3), rep(1:4, 3)), c("Animal", foo[1:3]))
as.data.frame(lapply(l, type.convert))  ## stringsAsFactors=FALSE if desired
#    Animal A B C
# 1     Dog 1 2 3
# 2     Cat 4 5 6
# 3    Goat 7 8 9

Upvotes: 1

d.b
d.b

Reputation: 32548

Just split it into required number of rows and rbind it. I added "Animal" at the start of foo to make the elements equal in each row when splitting

foo = c("Animal", foo)
df = data.frame(do.call(rbind, split(foo, ceiling(seq_along(foo)/4))),
                                                      stringsAsFactors = FALSE)
colnames(df) = df[1,]
df = df[-1,]
df
#  Animal A B C
#2    Dog 1 2 3
#3    Cat 4 5 6
#4   Goat 7 8 9

Upvotes: 2

David Heckmann
David Heckmann

Reputation: 2939

you can create a matrix from your vector and turn it into a data frame:

foo<-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")
foo <- c("Animal" , foo)
m <- matrix(foo , ncol = 4  , byrow = TRUE)
df <- as.data.frame(m[-1,] , stringsAsFactors = FALSE)  
colnames(df) <- m[1,]
# I assume you want numerics for your A,B,C columns:
df[,2:4]<-apply(df[,2:4],2,as.numeric)

lapply(df,class)
$Animal
[1] "character"

$A
[1] "numeric"

$B
[1] "numeric"

$C
[1] "numeric"

Upvotes: 5

Related Questions