Fionn
Fionn

Reputation: 207

Create new column based on factor and numeric data

I have a data frame that I would like to convert the values in one column into new columns, using the data in the adjacent column. Each factor in df$species would become a new column, and the data for the new column would be the respective data from df$fish_num however, I'm really confused as to go about how to do this and don't know where to start!

This is my current df:

     site treatment section species fish_num
 1 Site 1   Control       A    parr        7
 2 Site 1   Control       A  salmon        6
 3 Site 1   Control       B   trout        4
 4 Site 1   Control       B  salmon       12
 5 Site 1 Treatment       A    parr        8
 6 Site 1 Treatment       A  salmon        5
 7 Site 1 Treatment       B   trout       15
 8 Site 1 Treatment       B  salmon        9

df <- structure(list(site = c("Site 1", "Site 1", "Site 1", "Site 1", 
"Site 1", "Site 1", "Site 1", "Site 1"), treatment = c("Control", 
"Control", "Control", "Control", "Treatment", "Treatment", "Treatment",
"Treatment"), section = c("A", "A", "B", "B", "A", "A", "B",
"B"), species = c("parr", "salmon", "trout", "salmon", "parr",
"salmon", "trout", "salmon"), fish_num = c(7L, 6L, 4L, 12L, 8L,
5L, 15L, 9L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))

And I would like to be able to produce the following:

   site treatment section fish_num parr salmon trout
1 Site 1   Control       A        7    7      0     0
2 Site 1   Control       A        6    0      6     0
3 Site 1   Control       B        4    0      0     4
4 Site 1   Control       B       12    0     12     0
5 Site 1 Treatment       A        8    8      0     0
6 Site 1 Treatment       A        5    0      5     0
7 Site 1 Treatment       B       15    0      0    15
8 Site 1 Treatment       B        9    0      9     0

I'm unsure as to the best approach!

Upvotes: 0

Views: 382

Answers (4)

Abdessabour Mtk
Abdessabour Mtk

Reputation: 3888

using tidyverse's pivot_wider

df %>% 
 mutate(id=1:nrow(.)) %>%
 pivot_wider(names_from = species, values_from=fish_num, id_cols=everything())


#>   site   treatment section    id  parr salmon trout
#>   <fct>  <fct>     <fct>   <int> <int>  <int> <int>
#> 1 Site 1 Control   A           1     7     NA    NA
#> 2 Site 1 Control   A           2    NA      6    NA
#> 3 Site 1 Control   B           3    NA     NA     4
#> 4 Site 1 Control   B           4    NA     12    NA
#> 5 Site 1 Treatment A           5     8     NA    NA
#> 6 Site 1 Treatment A           6    NA      5    NA
#> 7 Site 1 Treatment B           7    NA     NA    15
#> 8 Site 1 Treatment B           8    NA      9    NA

Upvotes: 1

tifu
tifu

Reputation: 1411

One way is using pivot_wider() from tidyr if you wanna stay within the tidyverse. I also added call to mutate_at()and mutate() to replace missing values with zeroes and calculate the column fish_num.

library(tidyverse)

df %>%
   pivot_wider(names_from = species,
             values_from = fish_num) %>%
   mutate_at(c("parr", "salmon", "trout"), ~replace(., is.na(.), 0)) %>%
   mutate(fish_num = parr+salmon+trout)

# A tibble: 4 x 7
site   treatment section  parr salmon trout fish_num
<chr>  <chr>     <chr>   <dbl>  <dbl> <dbl>    <dbl>
1 Site 1 Control   A           7      6     0       13
2 Site 1 Control   B           0     12     4       16
3 Site 1 Treatment A           8      5     0       13
4 Site 1 Treatment B           0      9    15       24

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101247

A simple base R option is using xtabs

cbind(df,unclass(t(xtabs(fish_num ~species + q,cbind(df,q = 1:nrow(df))))))

which gives

    site treatment section species fish_num parr salmon trout
1 Site 1   Control       A    parr        7    7      0     0
2 Site 1   Control       A  salmon        6    0      6     0
3 Site 1   Control       B   trout        4    0      0     4
4 Site 1   Control       B  salmon       12    0     12     0
5 Site 1 Treatment       A    parr        8    8      0     0
6 Site 1 Treatment       A  salmon        5    0      5     0
7 Site 1 Treatment       B   trout       15    0      0    15
8 Site 1 Treatment       B  salmon        9    0      9     0

Data

> dput(df)
structure(list(site = c("Site 1", "Site 1", "Site 1", "Site 1", 
"Site 1", "Site 1", "Site 1", "Site 1"), treatment = c("Control", 
"Control", "Control", "Control", "Treatment", "Treatment", "Treatment",
"Treatment"), section = c("A", "A", "B", "B", "A", "A", "B",
"B"), species = c("parr", "salmon", "trout", "salmon", "parr",
"salmon", "trout", "salmon"), fish_num = c(7L, 6L, 4L, 12L, 8L,
5L, 15L, 9L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))

Upvotes: 1

cdalitz
cdalitz

Reputation: 1277

You can use the trick that TRUE and FALSE in R numerically represent zero and one:

df$salmon <- df$fish_num * (df$species == "salmon")
df$trout <- df$fish_num * (df$species == "trout")

To do this for all occurring species, put it in a loop over levels(df$species).

Upvotes: 1

Related Questions