aldorado
aldorado

Reputation: 4844

How to stack input from two csv files?

I am trying to use the stack command on data loaded from two text csv files I want to compare. I want to use crossprod(table(stack(data))) to see how many strings the different columns have in common (In the example it would be "dog" and "cat"). In this example the csv files contain columns with different numbers of strings.

> one<-read.delim("one.csv",sep="\t",header=F)
> two<-read.delim("two.csv",sep="\t",header=F)

> one
       V1
1     dog
2 hamster
3   mouse
4     cat

> two
      V1
1    dog
2    cat
3 rabbit

> data<-list(one,two)
> stack(data)
Error in stack.default(data) : at least one vector element is required

If I manually create lists with one<-c("dog",...) it works. What am I doing wrong, and how can I do this right?

Upvotes: 1

Views: 325

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You have a few problems here that you need to address in order to get stack to work as you intend it to.

  1. stack will not do anything to factor variables.
  2. stack works with named lists.
  3. stack does not work with nested lists, and a data.frame is a special type of list.

Let's look at addressing each of these:

Make sure that your read.table includes stringsAsFactors = FALSE. Here, I'm creating two data.frames with that argument included.

one <- data.frame(V1 = c("dog", "hamster", "mouse", "cat"), stringsAsFactors=FALSE)
two <- data.frame(V1 = c("dog", "cat", "rabbit"), stringsAsFactors=FALSE)

Make sure that your list is a named list.

data <- list(one = one, two = two)

Two requirements down... test. Error remains....

stack(data)
# Error in stack.default(data) : at least one vector element is required

"Flatten" your list, but not fully--use recursive = FALSE. Test with stack:

stack(unlist(data, recursive=FALSE))
#    values    ind
# 1     dog one.V1
# 2 hamster one.V1
# 3   mouse one.V1
# 4     cat one.V1
# 5     dog two.V1
# 6     cat two.V1
# 7  rabbit two.V1

From there, you can do your t/crossprod:

tcrossprod(table(stack(unlist(data, recursive=FALSE))))
#          values
# values    cat dog hamster mouse rabbit
#   cat       2   2       1     1      1
#   dog       2   2       1     1      1
#   hamster   1   1       1     1      0
#   mouse     1   1       1     1      0
#   rabbit    1   1       0     0      1

Upvotes: 3

Related Questions