Reputation: 8824
Edit:
So, this format would work:
featureID charge xcoordinate ycoordinate
1 2 5105.9217 336.125209180674
1 2 5108.7642 336.124751115092
2 0 2434.9217 145.893331325278
But what if I have two columns with multiple value that are linked. Say column quality has a machine and a quality linked and the column looks like this
MachineQuality
[[{1:1224}, {2:3453}], [{1:2242}, {2:4142}]
Now if I want to split that up like I did with the coordinates of the convexhull I would need 2 rows instead of 1. But wouldn't I need 2 rows for every row that is already in (so 4, because there are already 2 extra for the coordinates) like this:
featureID charge xcoordinate ycoordinate quality1 quality2
1 2 5105.9217 336.125209180674 1224 3453
1 2 5105.9217 336.125209180674 2242 4142
1 2 5108.7642 336.124751115092 1224 3453
1 2 5108.7642 336.124751115092 2242 4142
[...]
Would it have to be like this?
I'm very new to R, my knowledge doesn't go much further than knowing how to make a vector and some simple plots. I'm going to use R for an internship project the next couple of months and during this time I will (hopefully) learn some of the ins and outs of R. However, before I start I need to produce the data that I'm going to do the statistics on. I need to know beforehand how I should format my output CSV data so that I can easily read it in once I start my R analysis.
One thing that I've been asked to do is make a CSV file out of the data so that it can be read in by R. The example CSV files for importing with R that I've seen all look like this
featureID Charge value
1 2 10
2 0 9
However, my data mostly consists out of columns for which the values contain multiple values. To clarify: As an example, my data exists of "features" that, amongs other information has a "convexhull". This convexhull consists of paired x and y coordinates. So what I could have for data is (only showing two coordinates, can be many)
featureID Charge Convexhull
1 2 [[{'y': '336.125209180674'}, {'x': '5105.9217'}], [{'y': '336.124751115092'}, {'x': '5108.7642'}]]
Is it possible to get this in one CSV file, being able to read it in R correctly (so that the paired x and y coordinates are preserved)? If so, how should the CSV file look like? For example, I've seen examples for CSV files with multiple values that look like this:
featureID charge xcoordinate ycoordinate
1 2 5105.9217 336.125209180674
5108.7642 336.124751115092
2 0 2434.9217 145.893331325278
But I can't find if this is easily imported by R.
If this is not doable in one CSV file, are the CSV files easily imported independently, with a primary key idea, like database linking?
Upvotes: 2
Views: 1254
Reputation: 269556
long vs. wide form. Your last example is known as long form (except all cells should be filled in) and your first example is roughly wide form as discussed on the ?reshape
page and illustrated in the examples at the end of that page. You likely want to stick with long form. For an alternative see the reshape2
package.
save & load. Note that if you are only writing it out to read it back in to R later (as opposed to communicating it to some other software) you could use save
and load
which don't require any change to the object at all.
json. Another possibility given the form of your example is that you might want to look at the rjson package .
Upvotes: 2
Reputation: 23758
The only critical things are that you have a unique character separating your data columns and that each column is the same length. As long as the second row in your last example is filled in that will import fine.
You need to consider what you want to do with the data after it's in R to decide how you might want any other special formatting beforehand. But, as long as the column separator is a unique character and the columns are of equal length then it will import.
(You can violate the unique separator requirement if your entries are wrapped in quotes. And if you want to get really fancy you could "import" almost anything. But if someone's asking you to format the data then they probably want a rectangular data.frame compatible layout. They probably want unique values in each column (no columns of points). But that's between you and them.)
Upvotes: 2