Reputation: 942
The following list "ls" contains three data frames:
unigrams = data.frame(freq = c(3, 3, 5, 4, 3, 41),
term = c("a-list", "a-p", "aaa", "aam", "aamir", "aaron"))
bigrams = data.frame(freq = c(13, 1, 1, 2, 1, 4),
term = c("a a", "a abode", "a about", "a absolutely", "a accessory", "a acre"))
trigrams = data.frame(freq = c(1, 1, 1, 1, 1, 1),
term = c("a a card", "a a divorce", "a a dreamer", "a a great", "a a guy", "a a hand"))
ls = list(unigrams, bigrams, trigrams)
Which give us this:
[[1]]
freq term
1 3 a-list
2 3 a-p
3 5 aaa
4 4 aam
5 3 aamir
6 41 aaron
[[2]]
freq term
1 13 a a
2 1 a abode
3 1 a about
4 2 a absolutely
5 1 a accessory
6 4 a acre
[[3]]
freq term
1 1 a a card
2 1 a a divorce
3 1 a a dreamer
4 1 a a great
5 1 a a guy
6 1 a a hand
I want to separate the column "term" in each data frame by the number of words, creating the columns "word1", "word2", "word3". Like this:
freq word1
1 3 a-list
2 3 a-p
3 5 aaa
4 4 aam
5 3 aamir
6 41 aaron
freq word1 word2
1 13 a a
2 1 a abode
3 1 a about
4 2 a absolutely
5 1 a accessory
6 4 a acre
freq word1 word2 word3
1 1 a a card
2 1 a a divorce
3 1 a a dreamer
4 1 a a great
5 1 a a guy
6 1 a a hand
My try:
new_ls = list()
for (i in length(ls)) {
x = ls[[i]]
# Split each word in column "term":
x[,paste("word", 1:i, sep = "")] = as.character(lapply(strsplit(as.character(x$term), split=" "), "[", i))
x = subset(x, select = -term)
new_ls[[i]] = x
}
Unfortunately, this last snippet only stores some wrong result in the last element:
[[1]]
NULL
[[2]]
NULL
[[3]]
freq word1 word2 word3
1 1 card card card
2 1 divorce divorce divorce
3 1 dreamer dreamer dreamer
4 1 great great great
5 1 guy guy guy
6 1 hand hand hand
What am I doing wrong?
Upvotes: 2
Views: 44
Reputation: 51582
splitstackshape
library makes this task easy,
library(splitstackshape)
lapply(ls, function(i) cSplit(i, 'term', sep = ' ', direction = 'wide'))
Upvotes: 1