Reputation: 405
How to create columns of a data frame from a long character vector that contains three comma seperated values in each row. The first element contains the names of the data frame columns.
Not every row has three columns, some places there is just a trailing comma:
> string.split.cols[1] #This row is the .names
[1] "Acronym,Full form,Remarks"
> string.split.cols[2]
[1] "AC,Actual Cost, "
> string.split.cols[3]
[1] "ACWP,Actual Cost of Work Performed,Old term for AC"
> string.split.cols[4]
[1] "ADM,Arrow Diagramming Method,Rarely used now"
> string.split.cols[5]
[1] "ADR,Alternative Dispute Resolution, "
> string.split.cols[6]
[1] "AE,Apportioned Effort, "
The output should be a df with three columns, I'm only interested in the first two columns and will throw out the third.
This is the original string, some columns are not comma escaped but that isn't a big huge deal.
string.cols <- [1] "Acronym,Full form,Remarks\nAC,Actual Cost, \nACWP,Actual Cost of Work Performed,Old term for AC\nADM,Arrow Diagramming Method,Rarely used now\nADR,Alternative Dispute Resolution, \nAE,Apportioned Effort, \nAOA,Activity-on-Arrow,Rarely used now\nAON,Activity-on-Node, \nARMA,Autoregressive Moving Average, \nBAC,Budget at Completion, \nBARF,Bought-into, Approved, Realistic, Formal,from Rita Mulcahy's PMP Exam Prep\nBCR,Benefit Cost Ratio, \nBCWP,Budgeted Cost of Work Performed,Old term for EV\nBCWS,Budgeted Cost of Work Scheduled,Old term for PV\nCA,Control Account, \nCBR,Cost Benefit Ratio, \nCBT,Computer-Based Test, \n..."
Upvotes: 0
Views: 65
Reputation: 42679
You can use rbind.data.frame
to do this, after splitting the string:
x <- do.call(rbind.data.frame, strsplit(split.string.cols[-1], ','))
names(x) <- strsplit(split.string.cols[1], ',')[[1]]
x
## Acronym Full form Remarks
## 1 AC Actual Cost
## 2 ACWP Actual Cost of Work Performed Old term for AC
## ...
As a one-liner:
setNames(do.call(rbind.data.frame,
strsplit(split.string.cols[-1], ',')
),
strsplit(split.string.cols[1], ',')[[1]]
)
Upvotes: 1
Reputation: 24139
I found this routine to be very fast for splitting a string and converting to a data frame.
slist<-strsplit(mylist,",")
x<-sapply(slist, FUN= function(x) {x[1]})
y<-sapply(slist, FUN= function(x) {x[2]})
df<-data.frame(Column1Name=x, Column2Name=y, stringsAsFactors = FALSE)
where mylist
is your vector of strings to split.
Upvotes: 1
Reputation: 5600
Have you tried the text input for read.csv
?
df <- read.csv( text = string.split.cols, header = T )
Upvotes: 3