Reputation: 23
I am just starting with R and trying to learn ways of working with csv files
Sample Data Set
Org_Name Question# Response(scales from 1 through 5)
Org1 1 1
Org1 2 3
Org1 3 5
Org2 1 4
Org2 2 2
Org2 3 3
Org3 1 4
Org3 2 1
Org3 3 5
I am trying to figure out how to do some data analysis using R
So my questions for you all is this
Is R even a good tool for this? . But I am not sure if Excel would be a better choice (I am more comfortable with Excel)
How does one work with table in R? For example if I want to check which Org Names have scored high (4-5) in Question#2 and Low (1-2) in Question#1. How frequently does that happen? Is there a method to do this?
Is there any good tutorial/resources for learning R. I understand that R is a great choice for data analysis and I would like to learn more about it.
Upvotes: 2
Views: 142
Reputation: 9
if you're a beginner, downloading some packages will help you a lot. here are some example codes for your questions using dplyr
package:
1) R is a great tool for any kind of data manipulation or analyses, and reading csv files is very easy:
dat <- read.csv ("path")
2) once you read your csv file into an object, like above "dat", the dplyr
package has a bunch of functions to do pretty much any manipulations, e.g., your question of "check which Org Names have scored high (4-5) in Question#2 and Low (1-2) in Question#1."
this will give you a the Org_Names that satisfy the conditions you specified:
dat %>%
filter (Question2 >= 4 & Question1 <= 2) %>% select (Org_Name)
and how frequently, i'm guessing you want a count?
dat %>%
filter (Question2 >= 4 & Question1 <= 2) %>% select (Org_Name) %>% nrow()
Upvotes: 0
Reputation: 522712
1) R is a great tool for handling your CSV data. In a few minutes, you can download RStudio and be up and running.
Here is some sample code which shows you how to get started:
sample <- data.frame(Org_Name = c(rep("Org1", 3), rep("Org2", 3), rep("Org3", 3)),
Question = c(1,2,3,1,2,3,1,2,3),
Response = c(1,3,5,4,2,3,4,1,5))
2) This defines a data frame called sample
and assigns your data to it. To find out all Orgs which scored 4 or higher on question 2, you can use this:
> sample$Org_Name[sample$Response >= 4 & sample$Question == 2]
factor(0)
This returns factor(0)
which means that no Orgs match. However, if you want to find out which Orgs have a low response for question 2 you can try:
> sample$Org_Name[sample$Response <= 2 & sample$Question == 2]
[1] Org2 Org3
3) Google is great place to start for finding R resources. And the official R documentation is good too.
Upvotes: 2