Reputation: 11
I'm currently doing a project that uses R to process some large csv files that are saved in my local directory linked to my repo.
So far, I managed to create the R project and commit and push R scripts into the repo with no problem.
However, the scripts read in the data from the csv files saved in my local directory, so the code goes in a form
df <- read.csv("mylocaldirectorylink")
However, this is not helpful if my partner and I working on the same project have to change that url to our own local directory every time we pull it off the repo. So I was thinking that maybe we can upload the csv files onto GitHub Repo and let the R script refer directly to the csv files online.
So my questions are:
Upvotes: 1
Views: 7843
Reputation: 572
This is for a Python script.
You can track csv files by editing your .gitignore file.
**OR**
You can add csv files in your github repo, which can be used by others.
I did so by following steps:
Upvotes: 0
Reputation: 629
Firstly, it's generally a bad idea to store data on Github, especially if it's large. If you want to save it somewhere on the Internet, you can use, say, Dataverse, and then can access your data with URL (through the API), or Google Drive, as Jake Kaupp suggested.
Now back to your question. If your data doesn't change, I would just use not the absolute paths to CSV but relative ones. In other words, instead of
df<-read.csv("C:/folder/subfolder/data.csv")
I would use
df <- read.csv("../data.csv")
If you are working with R project, then the initial working directory is inside the folder of the project. You can check it with getwd()
. This working directory changes as you move the R project. Just agree with your colleague that your data file should be in the same folder where the folder with R project is situated.
Upvotes: 2