Reputation: 543
I have a dataframe in R that looks like this:
Word Base Number Type
- - - -
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural
I would like to transform it so that it looks like this:
Word_Sg Word_Pl Base Num_Singular Num_Plural
-- -- -- -- --
shoe shoes shoe 4834 49955
toy toys toy 75465 23556
key keys key 39485 6546
NA jazz jazz NA 58765
So rather than having two rows for the values for singular & plural, I want to have two colums, one with the number for singular, and one with the number for plural.
I've tried a few things using dplyr::summarize
, but so far, without any success. Here is the code that I've come up with so far:
dataframe1 <- dataframe %>%
mutate(Num_Singular = case_when(Type == "singular" ~ Number)) %>%
mutate(Num_Plural = case_when(Type == "plural" ~ Number)) %>%
dplyr::select(Word, Base, Num_Singular, Num_Plural) %>%
group_by(Base) %>%
dplyr::summarize(Num_Singular = paste(na.omit(Num_Singular)),
Num_Plural = paste(na.omit(Num_Plural))
However, it gives me this error:
Error in summarise_impl(.data, dots) :
Column `Num_Singular` must be length 1 (a summary value), not 2)
I think the problem might be that there are rows that don't necessarily have singular AND plural, but only either (e.g. "jazz"). Most rows have both though.
So how can I do this in R or dplyr?
Upvotes: 2
Views: 1130
Reputation: 8848
tidyr
's new pivot_wider()
function makes this simple...
library(dplyr)
library(tidyr)
dat <- read.table(header = T, stringsAsFactors = F, text='
Word Base Number Type
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural')
dat %>%
pivot_wider(id_cols = Base, names_from = Type, values_from = c(Word, Number))
# # A tibble: 4 x 5
# Base Word_singular Word_plural Number_singular Number_plural
# <chr> <chr> <chr> <int> <int>
# 1 shoe shoe shoes 4834 49955
# 2 toy toy toys 75465 23556
# 3 key key keys 39485 6546
# 4 jazz NA jazz NA 58765
Upvotes: 1
Reputation: 8848
You could join the plural
and singular
subsets of your data by Base
, then remove the Type
columns and reorder the others...
full_join(filter(dat, Type == "plural"),
filter(dat, Type == "singular"),
by = "Base",
suffix = c("_Pl", "_Sg")) %>%
select(Word_Sg, Word_Pl, Base, Number_Sg, Number_Pl)
# Word_Sg Word_Pl Base Number_Sg Number_Pl
# 1 shoe shoes shoe 4834 49955
# 2 toy toys toy 75465 23556
# 3 key keys key 39485 6546
# 4 <NA> jazz jazz NA 58765
Upvotes: 0
Reputation: 8848
The core idea is to identify each data point by it's type and whether it's a word or a number... then it's easy to spread to the format you want. (I won't bother renaming the variables or ordering them specifically to match your expected output because that's easy to do and not part of the problem here)
library(dplyr)
library(tidyr)
dat <- read.table(header = T, stringsAsFactors = F, text='
Word Base Number Type
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural')
dat %>%
gather(variable, value, Word, Number) %>%
unite(Type, variable, Type) %>%
spread(Type, value, convert = T) %>%
as_tibble()
# # A tibble: 4 x 5
# Base Number_plural Number_singular Word_plural Word_singular
# <chr> <int> <int> <chr> <chr>
# 1 jazz 58765 NA jazz NA
# 2 key 6546 39485 keys key
# 3 shoe 49955 4834 shoes shoe
# 4 toy 23556 75465 toys toy
Upvotes: 0
Reputation: 160447
If you first look at the first few columns::
select(dat, Base, Word, Type)[1:2,]
# Base Word Type
# 1 shoe shoe singular
# 2 shoe shoes plural
From here, consider it just spreading it into a singular/plural columns, effectively going from "tall" to "wide". (It would be much more apparent if there were more than two categories in Type
.)
select(dat, Base, Word, Type) %>%
spread(Type, Word) %>%
rename(Word_Pl=plural, Word_Sg=singular)
# Base Word_Pl Word_Sg
# 1 jazz jazz <NA>
# 2 key keys key
# 3 shoe shoes shoe
# 4 toy toys toy
You can easily repeat this for Number
as well. From there, it's just a matter of merging/joining them based on the key column, Base
:
full_join(
select(dat, Base, Word, Type) %>%
spread(Type, Word) %>%
rename(Word_Pl=plural, Word_Sg=singular),
select(dat, Base, Number, Type) %>%
spread(Type, Number) %>%
rename(Num_Pl=plural, Num_Sg=singular),
by = "Base"
)
# Base Word_Pl Word_Sg Num_Pl Num_Sg
# 1 jazz jazz <NA> 58765 NA
# 2 key keys key 6546 39485
# 3 shoe shoes shoe 49955 4834
# 4 toy toys toy 23556 75465
Consumable data:
library(dplyr)
library(tidyr)
dat <- read.table(text='Word Base Number Type
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural', header=TRUE, stringsAsFactors=FALSE)
Upvotes: 4