NiuBiBang
NiuBiBang

Reputation: 628

How to repeat a variable down a list?

I have a field of semantic tags/semantic tag categories, along with a Source, Date, & ID variables. I want to break out the semantic tag field into the respective tags/tag categories, then transpose the dataset. I have most of the code worked out, but am still stuck on getting the ID/Date/Source variables to list down the matrix I create from the tag categories/tags. An example of the data I start with (tab-delimited) is below:

ID  Source  Date  Semantic Tags
1 thestate  2013-01-18  Person:elizabeth colbert-busch, Organization:congress
2 abcnews4  2013-04-03  PoliticalEvent:congressional race, Person:colbert busch, topicname:politics
3 Politics  2013-04-02  Person:mark sanford, Person:elizabeth colbert busch, Person:colbert busch, Organization:republican party

I want the data to look like a database format (also tab-delimited):

ID  Source  Date  Tag Type  Tag
1 thestate  2013-01-18  Person  elizabeth colbert-busch
1 thestate  2013-01-18  Organization  congress
2 abcnews 2013-04-03  Political event congressional race
2 abcnews 2013-04-04  Person  colbert-busch
2 abcnews 2013-04-05  topicname politics
3 Politics  2013-04-02  person  mark sanford
3 Politics  2013-04-03  person  elizabeth colbert-busch
3 Politics  2013-04-04  organization  republican party

I'm having no trouble separating the tag types & tags (thnx @Tyler Rinker for help on that...), but when I am stuck on getting the ID, Source, & Date variables to repeat listwise down the tag type/tag matrix that I create. Can anyone help? My code is below:

et3 <- lapply(strsplit(as.character(et$Semantic.Tags), ","), function(x) gsub("^//s+|//s+$", "", x)) # break out semantic tags/tag type by comma

et3 <- lapply(et3, strsplit, ":(?!/)", perl=TRUE) # break on colon

The following lines of code, where I try to replicate the other three variables, is where I have problems:

Date <- rep(et$Date, seq_along(et3), sapply(et3, length))

ID <- rep(et$ID, seq_along(et3), sapply(et3, length)) # Note that if I don't use "et$ID", the IDs replicate without issue...

...And likewise for variable Source. The warning msg I receive is: In rep(et$Date, seq_along(et3), sapply(et3, length)): first element used of 'length.out' argument. And only the first value appears in the output. The same problem happens if I first bind the et3 lists as a matrix. Can anyone help on repeating the variables down a matrix/list? I have also tried to use a transpose command, but I don't know how to treat the tags that I turned into lists.

thanks for anyone's help.

Upvotes: 1

Views: 771

Answers (1)

mnel
mnel

Reputation: 115382

# 1. create a matrix containing the expanded information for each row
#
et3 <- lapply(et3, function(x) {xx <- do.call(rbind, x)
  colnames(xx) <- c('tag','value')
  xx})
 # 2. cycle through each row and recombine

 do.call(rbind, lapply(seq_len(nrow(edt)), 
    function(x) cbind(edt[x, 1:3, drop = FALSE], et3[[x]])))

data.table approach

# an alternative is to use data.table
library(data.table)
EDT <- data.table(edt)
# string processing
EDT[, sc := lapply(strsplit(as.character(Semantic.Tags), ","), function(x) gsub("^//s+|//s+$", "", x)) ]
 EDT[, et3 := lapply(et3, strsplit, ":(?!/)", perl=TRUE)]

# rapply and by to create data.table  
EDT[, list(tag = rapply(et3, classes = 'character', function(x)x[1]), 
           value = rapply(et3, classes = 'character', function(x)x[2])), 
      by = list(ID, Source,Date)]



   ID   Source       Date            tag                   value
1:  1 thestate 2013-01-18         Person elizabeth colbert-busch
2:  1 thestate 2013-01-18   Organization                congress
3:  2 abcnews4 2013-04-03 PoliticalEvent      congressional race
4:  2 abcnews4 2013-04-03         Person           colbert busch
5:  2 abcnews4 2013-04-03      topicname                politics
6:  3 Politics 2013-04-02         Person            mark sanford
7:  3 Politics 2013-04-02         Person elizabeth colbert busch
8:  3 Politics 2013-04-02         Person           colbert busch
9:  3 Politics 2013-04-02   Organization        republican party

Upvotes: 4

Related Questions