Recommended way to create variable, flexible dataframe?

Question

Given a table with a variety of values and lengths, what's the best way to create a dataframe for columnar analysis?

Example, given an unlabeled CSV that looks like this:

A,B,A,C
A,B,C,D,E,F
B,C,A,B,F,F,F
A,B
B,C,D
A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,Y,X,Z,AA,AB,AC

The goal will be to eventually assign a value to each letter based on what position it appears in.

Given the variable, and unknown length of the rows, how should I approach this problem? Set up a dataframe with an absurdly large number of columns as a placeholder?

Shree · Accepted Answer

One option is to read each row as an element in a vector using readLines() -

x <- readLines("test.csv") # add appropriate path to the file
x
[1] "A,B,A,C"              "A,B,C,D,E,F"                                                 
[3] "B,C,A,B,F,F,F"        "A,B"                                                         
[5] "B,C,D"                "A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,Y,X,Z,AA,AB,AC"

Now you can manipulate each element of this vector as you wish and then assemble the results in your desired structure. This way you don't have to "Set up a dataframe with an absurdly large number of columns as a placeholder".

Recommended way to create variable, flexible dataframe?

Answers (1)

Related Questions