Reputation: 11

How to upload CSV files to GitHub repo and use them as data for my R scripts

I'm currently doing a project that uses R to process some large csv files that are saved in my local directory linked to my repo.

So far, I managed to create the R project and commit and push R scripts into the repo with no problem.

However, the scripts read in the data from the csv files saved in my local directory, so the code goes in a form

df <- read.csv("mylocaldirectorylink")

However, this is not helpful if my partner and I working on the same project have to change that url to our own local directory every time we pull it off the repo. So I was thinking that maybe we can upload the csv files onto GitHub Repo and let the R script refer directly to the csv files online.

So my questions are:

Why can't I upload csv files onto GitHub? They keep saying that my file is too large.
If I can upload the csv files, how to I read the data from these csv files?

Upvotes: 1

Answers (2)

mufassir

Reputation: 572

This is for a Python script.

You can track csv files by editing your .gitignore file.

     **OR**

You can add csv files in your github repo, which can be used by others.

I did so by following steps:

Checkout the branch on github.com
Go to the folder where you want to keep csv files.
Here, you will see an option "Add file" in top right area as shown below:

Here you can upload csv files and commit the changes in same branch or by creating a new branch.

Upvotes: 0

Alex Knorre

Reputation: 629

Firstly, it's generally a bad idea to store data on Github, especially if it's large. If you want to save it somewhere on the Internet, you can use, say, Dataverse, and then can access your data with URL (through the API), or Google Drive, as Jake Kaupp suggested.

Now back to your question. If your data doesn't change, I would just use not the absolute paths to CSV but relative ones. In other words, instead of

df<-read.csv("C:/folder/subfolder/data.csv")

I would use

df <- read.csv("../data.csv")

If you are working with R project, then the initial working directory is inside the folder of the project. You can check it with getwd(). This working directory changes as you move the R project. Just agree with your colleague that your data file should be in the same folder where the folder with R project is situated.

Upvotes: 2

How to upload CSV files to GitHub repo and use them as data for my R scripts

Answers (2)

Related Questions