Reputation:
I'm currently working in a psychology lab and beginning data analysis on response time data from a task.
The task itself goes on multiple trials and this makes the data disorganized to look at - especially more so now since my initial job was to merge all the data into a single data frame. In vertical orientation of the data, we can see the participate ID and the response time. Great, those are important bits of information..however, instead of seeing trial numbers and such, we just see the data represented as this:
Participant 1, 23
Participant 1, 22
Participant 1, 25
Participant 2, 36
It goes on like that repeating participant ID's (our sample size goes well into the thousands, so our data frame is very long). We can't pick out the important info nor see which trial is which. So, we want a horizontal representation.
Now, I am using R as means for data analysis... but I am a bit new to R and this is my first project with it. While I have done online R courses, you really learn it best when working with actual data.
In efforts to fix my data I have been looking into the packages reshape and tidyr: reshape has melt and cast which could help me and tidyr has pivot_wider which I think could help me more than melt and cast.
I have been playing around with both using a smaller data frame from my actual data as a means of testing out code.
I used pivot_wider at first:
newdf1_test %>%
pivot_wider(names_from = name, values_from = V4)
I got a tibble but it was only had one of the participant's ID and one response time value
I also got a warning message stating that values in V4 are not uniquely defined and I was given suggestions on how to bypass the warning. All of the suggestions just returned to me the error in a data from with replacement having 1 row and data having 0. What does this mean exactly?
I'm just not sure how this works yet. When I melt the data frame I'm not sure what to do afterward because all I see is still a long data frame as opposed to wide.
melt_testdf <- melt(newdf1_test, name = c("SID", V4 = c("response_time")
I was under the impression that this would add two new variables: SID and response_time which would help me make two different data tables and then transpose them in order to make the merged data frame horizontal. But, no, the new data frame returned to me was showed the name (which has the participants ID), variable with just the value V4 (V4 was the name for the column that had response time originally), and value which was the column response time ended up being.
I know I am supposed to cast in order to reshape the data further, but seeing as that I'm already confused I don't want to proceed.
What am I to do? I'm so confused by this right now and no matter how much I read I am not getting anywhere with this.
Upvotes: 0
Views: 722
Reputation: 344
The error you're seeing is because pivot_wider assumes there is only one row, so it needs a way to aggregate the V4 results.
If you want to have multiple rows you would need to supply an argument or data that will let the new wide table have a meaningful way to designate new rows.
Here's an example where I've supplied an id for the new table:
newdf1_test <- tribble(
~test, ~name, ~V4,
'001', 'Participant 1', 23,
'002','Participant 1', 22,
'003','Participant 1', 25,
'001','Participant 2', 36)
newdf1_test %>%
pivot_wider(
names_from = name,
values_from = V4)
# A tibble: 3 x 3
test `Participant 1` `Participant 2`
<chr> <dbl> <dbl>
1 001 23 36
2 002 22 NA
3 003 25 NA
Essentially in this version the cols() argument for pivot_wider is implicit with the test variable. And also you can see that the new data table makes sense in a way that it wouldn't if it didn't have the test variable.
I hope that makes sense!
Upvotes: 2