How to create columns of a data frame from a long character vector that contains three comma seperated values in each row. The first element contains the names of the data frame columns. Not every row has three columns, some places there is just a trailing comma: > string.split.cols[1] #This row is the .names [1] "Acronym,Full form,Remarks" > string.split.cols[2] [1] "AC,Actual Cost, " > string.split.cols[3] [1] "ACWP,Actual Cost of Work Performed,Old term for AC" > string.split.cols[4] [1] "ADM,Arrow Diagramming Method,Rarely used now" > string.split.cols[5] [1] "ADR,Alternative Dispute Resolution, " > string.split.cols[6] [1] "AE,Apportioned Effort, " The output should be a df with three columns, I'm only interested in the first two columns and will throw out the third. This is the original string, some columns are not comma escaped but that isn't a big huge deal. string.cols <- [1] "Acronym,Full form,Remarks\nAC,Actual Cost, \nACWP,Actual Cost of Work Performed,Old term for AC\nADM,Arrow Diagramming Method,Rarely used now\nADR,Alternative Dispute Resolution, \nAE,Apportioned Effort, \nAOA,Activity-on-Arrow,Rarely used now\nAON,Activity-on-Node, \nARMA,Autoregressive Moving Average, \nBAC,Budget at Completion, \nBARF,Bought-into, Approved, Realistic, Formal,from Rita Mulcahy's PMP Exam Prep\nBCR,Benefit Cost Ratio, \nBCWP,Budgeted Cost of Work Performed,Old term for EV\nBCWS,Budgeted Cost of Work Scheduled,Old term for PV\nCA,Control Account, \nCBR,Cost Benefit Ratio, \nCBT,Computer-Based Test, \n..."

Reputation: 405

data frame from character vector that contains three comma seperated values in each row

How to create columns of a data frame from a long character vector that contains three comma seperated values in each row. The first element contains the names of the data frame columns.

Not every row has three columns, some places there is just a trailing comma:

> string.split.cols[1] #This row is the .names
[1] "Acronym,Full form,Remarks"
> string.split.cols[2]
[1] "AC,Actual Cost, "
> string.split.cols[3]
[1] "ACWP,Actual Cost of Work Performed,Old term for AC"
> string.split.cols[4]
[1] "ADM,Arrow Diagramming Method,Rarely used now"
> string.split.cols[5]
[1] "ADR,Alternative Dispute Resolution, "
> string.split.cols[6]
[1] "AE,Apportioned Effort, "

The output should be a df with three columns, I'm only interested in the first two columns and will throw out the third.

R is awesome

This is the original string, some columns are not comma escaped but that isn't a big huge deal.

string.cols <- [1] "Acronym,Full form,Remarks\nAC,Actual Cost, \nACWP,Actual Cost of Work Performed,Old term for AC\nADM,Arrow Diagramming Method,Rarely used now\nADR,Alternative Dispute Resolution, \nAE,Apportioned Effort, \nAOA,Activity-on-Arrow,Rarely used now\nAON,Activity-on-Node, \nARMA,Autoregressive Moving Average, \nBAC,Budget at Completion, \nBARF,Bought-into, Approved, Realistic, Formal,from Rita Mulcahy's PMP Exam Prep\nBCR,Benefit Cost Ratio, \nBCWP,Budgeted Cost of Work Performed,Old term for EV\nBCWS,Budgeted Cost of Work Scheduled,Old term for PV\nCA,Control Account, \nCBR,Cost Benefit Ratio, \nCBT,Computer-Based Test, \n..."

Upvotes: 0

Answers (3)

Matthew Lundberg

Reputation: 42679

You can use rbind.data.frame to do this, after splitting the string:

x <- do.call(rbind.data.frame, strsplit(split.string.cols[-1], ','))
names(x) <- strsplit(split.string.cols[1], ',')[[1]]
x
##  Acronym                     Full form         Remarks
## 1      AC                   Actual Cost                
## 2    ACWP Actual Cost of Work Performed Old term for AC
## ...

As a one-liner:

setNames(do.call(rbind.data.frame, 
                 strsplit(split.string.cols[-1], ',')
         ),
         strsplit(split.string.cols[1], ',')[[1]]
)

Upvotes: 1

Dave2e

Reputation: 24139

I found this routine to be very fast for splitting a string and converting to a data frame.

slist<-strsplit(mylist,",")
  x<-sapply(slist, FUN= function(x) {x[1]})
  y<-sapply(slist, FUN= function(x) {x[2]})
  df<-data.frame(Column1Name=x, Column2Name=y, stringsAsFactors = FALSE)

where mylist is your vector of strings to split.

Upvotes: 1

rosscova

Reputation: 5600

Have you tried the text input for read.csv?

df <- read.csv( text = string.split.cols, header = T )

Upvotes: 3

data frame from character vector that contains three comma seperated values in each row

Answers (3)

Related Questions