Reputation: 539
Given a table with a variety of values and lengths, what's the best way to create a dataframe for columnar analysis?
Example, given an unlabeled CSV that looks like this:
A,B,A,C
A,B,C,D,E,F
B,C,A,B,F,F,F
A,B
B,C,D
A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,Y,X,Z,AA,AB,AC
The goal will be to eventually assign a value to each letter based on what position it appears in.
Given the variable, and unknown length of the rows, how should I approach this problem? Set up a dataframe with an absurdly large number of columns as a placeholder?
Upvotes: 0
Views: 121
Reputation: 11150
One option is to read each row as an element in a vector using readLines()
-
x <- readLines("test.csv") # add appropriate path to the file
x
[1] "A,B,A,C" "A,B,C,D,E,F"
[3] "B,C,A,B,F,F,F" "A,B"
[5] "B,C,D" "A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,Y,X,Z,AA,AB,AC"
Now you can manipulate each element of this vector as you wish and then assemble the results in your desired structure. This way you don't have to "Set up a dataframe with an absurdly large number of columns as a placeholder".
Upvotes: 1