SoAni
SoAni

Reputation: 139

How to restructure dataframe from single column value into repeated column rows

I am trying to change the format of my data frame based on the values of a single column data to represent repeated sequences of another. My dataset consists of a file name, number of observations, date, and identification of species. With the dataset as it is (see below), I only manage to plot the values for which n>0, but I would like to to keep the zeros to have an overview of the proportions without losing the amount of observations.

I have tried tidyr::spread function but I don't want to create new columns based on the values of that single one. I want other columns (e.g., file.name, Date, ID) to be repeated based on the values of the first one (n). tidyr::melt also doesn't seem to do the trick for me...

Here's an example:

call_obs <- data.frame("file.name" = c("pa0095au_001_180315_192129.wav", "pa0095au_002_180315_193134.wav", "pa0095au_003_180315_194133.wav", "pa0097au_002_180316_004647.wav", "pa0097au_003_180316_005646.wav"), "ID" = c("HW","None", "None", "HW", "HW"), "n" = c(1,0,0,3,2),"Date" = c('2018-03-15','2018-03-15','2018-03-15', '2018-03-16', '2018-03-16')) 

I would like to have something like

data.frame("file.name" = c("pa0095au_001_180315_192129.wav", "pa0095au_002_180315_193134.wav", pa0095au_003_180315_194133.wav", "pa0097au_002_180316_004647.wav", "pa0097au_002_180316_004647.wav", "pa0097au_002_180316_004647.wav", "pa0097au_003_180316_005646.wav", "pa0097au_003_180316_005646.wav"), "ID" = c("HW","None", "None", "HW", "HW", "HW", "HW", "HW"), "n" = c(1,0,0,1,1,1,1,1), "Date" = c('2018-03-15','2018-03-15','2018-03-15', '2018-03-16', '2018-03-16', '2018-03-16', '2018-03-16', '2018-03-16'))

Date is as Date, file.name as character, ID as factor, n as numeric

Any help would be great.

Upvotes: 1

Views: 78

Answers (2)

Matt
Matt

Reputation: 2987

Using tidyr and dplyr you can do:

library(tidyr)
library(dplyr)

call_obs %>%
uncount(., if_else(n == 0, n +1, n)) %>%
mutate(n = if_else(n > 0, 1, 0))


#                      file.name   ID n       Date
#1 pa0095au_001_180315_192129.wav   HW 1 2018-03-15
#2 pa0095au_002_180315_193134.wav None 0 2018-03-15
#3 pa0095au_003_180315_194133.wav None 0 2018-03-15
#4 pa0097au_002_180316_004647.wav   HW 1 2018-03-16
#5 pa0097au_002_180316_004647.wav   HW 1 2018-03-16
#6 pa0097au_002_180316_004647.wav   HW 1 2018-03-16
#7 pa0097au_003_180316_005646.wav   HW 1 2018-03-16
#8 pa0097au_003_180316_005646.wav   HW 1 2018-03-16

Upvotes: 2

Clemsang
Clemsang

Reputation: 5481

Here is one way in base R using rep on rows:

call_obs <- call_obs[rep(seq(nrow(call_obs)), ifelse(call_obs$n == 0, 1, call_obs$n)),]
call_obs$n <- as.numeric(call_obs$n > 0)

Upvotes: 0

Related Questions