Reputation: 2071
I have this df column:
df <- data.frame(Strings = c("ñlas onepojasd", "onenañdsl", "ñelrtwofkld", "asdthreeasp", "asdfetwoasd", "fouroqwke","okasdtwo", "acmofour", "porefour", "okstwo"))
> df
Strings
1 ñlas onepojasd
2 onenañdsl
3 ñelrtwofkld
4 asdthreeasp
5 asdfetwoasd
6 fouroqwke
7 okasdtwo
8 acmofour
9 porefour
10 okstwo
I know that each value from df$Strings
will match with the words one, two, three or four
. And I also know that it will match with just ONE of those words. So to match them:
str_detect(df$Strings,"one")
str_detect(df$Strings,"two")
str_detect(df$Strings,"three")
str_detect(df$Strings,"four")
However, I'm stucked here, as I'm trying to do this table:
Homes Quantity Percent
One 2 0.3
Two 4 0.4
Three 1 0.1
Four 3 0.3
Total 10 1
Upvotes: 0
Views: 578
Reputation: 887291
A base R
option would be regmatches/regexpr
with table
table(regmatches(df$Strings, regexpr('one|two|three|four', df$Strings)))
# four one three two
# 3 2 1 4
adding addmargins
to get the sum
and then divide by that
out <- addmargins(table(regmatches(df$Strings,
regexpr('one|two|three|four', df$Strings))))
out/out[length(out)]
# four one three two Sum
# 0.3 0.2 0.1 0.4 1.0
Upvotes: 0
Reputation: 39858
With tidyverse
and janitor
you can do:
df %>%
mutate(Homes = str_extract(Strings, "one|two|three|four"),
n = n()) %>%
group_by(Homes) %>%
summarise(Quantity = length(Homes),
Percent = first(length(Homes)/n)) %>%
adorn_totals("row")
Homes Quantity Percent
four 3 0.3
one 2 0.2
three 1 0.1
two 4 0.4
Total 10 1.0
Or with just tidyverse
:
df %>%
mutate(Homes = str_extract(Strings, "one|two|three|four"),
n = n()) %>%
group_by(Homes) %>%
summarise(Quantity = length(Homes),
Percent = first(length(Homes)/n)) %>%
rbind(., data.frame(Homes = "Total", Quantity = sum(.$Quantity),
Percent = sum(.$Percent)))
In both cases the code, first, extracts the matching pattern and count the number of cases. Second, it groups by the matched words. Third, it computes the number of cases per word and the proportion of the given word from all words. Finally, it adds a "Total" row.
Upvotes: 2
Reputation: 51592
You can use str_extract
and then do the table
and prop.table
, i.e.
library(stringr)
str_extract(df1$Strings, 'one|two|three|four')
#[1] "one" "one" "two" "three" "two" "four" "two" "four" "four" "two"
table(str_extract(df1$Strings, 'one|two|three|four'))
# four one three two
# 3 2 1 4
prop.table(table(str_extract(df1$Strings, 'one|two|three|four')))
# four one three two
# 0.3 0.2 0.1 0.4
Upvotes: 1