Reputation: 10444
I want to analyze many years of of Quicken home finance records. I exported a file to qif and used bank2csv program to render a csv. Within Quicken one can use a category (eg automobile, tax), subcategories (eg automobile:service, automobile:fuel) and tags (eg self, spouse, son). bank2csv renders the categories:subcategories/tag as a concatenated string. I want to instead put the category in a category column, subcategory in a subcategory column and put whatever tags in the tag column. I saw a similar question but alas that worked bystrsplit
then unlist
and then indexing each element so that it could be written to the correct place by assignment. That will not work here since sometimes there is no tag and sometimes there is no subcategory. It is quite easy to split the string into a list and save that list in a column but how on earth does one assign the first element of the list to one column and the second element (if it exists) to a second column. Surely there is an elegant easy way.
simplified sample
library(data.table)
library(stringi)
dt <- data.table(category.tag=c("toys/David", "toys/David", "toys/James", "toys", "toys", "toys/James"), transaction=1:6)
How do I create a third and fourth column: category, tag. Some of tag would be NA
I can do the following but it does not get me very far. I need a way to specify the first or the second element of the resultant list (as opposed to the whole list)
dt[, category:= strsplit(x = category.tag, split = "/") ]
Upvotes: 3
Views: 1963
Reputation: 118789
Just pushed two functions transpose()
and tstrsplit()
in data.table v1.9.5.
With this we can do:
require(data.table)
dt[, c("category", "tag") := tstrsplit(category.tag, "/", fixed=TRUE)]
# category.tag transaction category tag
# 1: toys/David 1 toys David
# 2: toys/David 2 toys David
# 3: toys/James 3 toys James
# 4: toys 4 toys NA
# 5: toys 5 toys NA
# 6: toys/James 6 toys James
tstrsplit
is a wrapper for transpose(strsplit(as.character(x), ...))
. And you can also pass fill=.
to fill missing values with any other value than NA
.
transpose()
can also be used on lists, data frames and data tables.
Upvotes: 6
Reputation: 269491
1) Try read.table
read <- function(x) read.table(text = x, sep = "/", fill = TRUE, na.strings = "")
dt[, c("category", "tag") := read(category.tag)]
No extra packages are needed.
2) An alternative is to use separate
in the tidyr package:
library(tidyr)
separate(dt, category.tag, c("category", "tag"), extra = "drop")
The above is with tidyr version 0.1.0.9000 from github. To install it ensure that the devtools R package is installed and issue the command: devtools::install_github("hadley/tidyr")
.
Update: Incorporated Richard Scrivens' comment and minor improvements. Added tidyr solution.
Upvotes: 2
Reputation: 193517
Since you already have "stringi" loaded, you can also look at stri_split_fixed
and the simplify
argument:
setnames(cbind(dt, stri_split_fixed(dt$category.tag, "/", simplify = TRUE)),
c("V1", "V2"), c("category", "tag"))[]
# category.tag transaction category tag
# 1: toys/David 1 toys David
# 2: toys/David 2 toys David
# 3: toys/James 3 toys James
# 4: toys 4 toys NA
# 5: toys 5 toys NA
# 6: toys/James 6 toys James
Though I must admit I'm partial to cSplit
:-)
Upvotes: 1
Reputation: 99331
You could use cSplit
library(splitstackshape)
dt[, c("category", "tag") := cSplit(dt[,.(category.tag)], "category.tag", "/")]
dt
# category.tag transaction category tag
# 1: toys/David 1 toys David
# 2: toys/David 2 toys David
# 3: toys/James 3 toys James
# 4: toys 4 toys NA
# 5: toys 5 toys NA
# 6: toys/James 6 toys James
Upvotes: 4