Reputation: 23
I have a large data frame looking like this.
Name | Gender | Count | Probability |
---|---|---|---|
James | M | 12345 | 0.012313 |
John | M | 12334 | 0.012314 |
Robert | M | 12300 | 0.0124 |
Michael | M | 12222 | 0.13 |
William | M | 12121 | 0.123131 |
Mary | F | 10002 | 0.123112 |
I was wondering how i can get the sum of the counts of all of the names that start with each letter of the alphabet in the dataframe and create a new dataframe that looks like
Name Count
A 22222
B 44444
C 66666
Upvotes: 1
Views: 73
Reputation: 21938
I think you can use this solution:
library(dplyr)
library(stringr)
df %>%
mutate(initial = str_sub(Name, 1L, 1L)) %>%
group_by(initial) %>%
summarise(count = sum(Count, na.rm = TRUE)) %>%
as.data.frame()
initial count
1 J 24679
2 M 22224
3 R 12300
4 W 12121
Data
df <- tribble(
~Name, ~Gender, ~Count, ~Probability,
"James", "M", 12345, 0.012313,
"John", "M", 12334, 0.012314,
"Robert", "M", 12300, 0.0124,
"Michael", "M", 12222, 0.13,
"William", "M", 12121, 0.123131,
"Mary", "F", 10002, 0.123112
)
Upvotes: 2
Reputation: 389355
You can use tapply
in base R :
with(df, tapply(Count, substr(Name, 1, 1), sum))
Or if we you want output as dataframe.
stack(with(df, tapply(Count, substr(Name, 1, 1), sum)))
Upvotes: 1
Reputation: 11076
It is better to provide data with dput:
df <- structure(list(Name = c("James", "John", "Robert", "Michael",
"William", "Mary"), Gender = c("M", "M", "M", "M", "M", "F"),
Count = c(12345L, 12334L, 12300L, 12222L, 12121L, 10002L),
Probability = c(0.012313, 0.012314, 0.0124, 0.13, 0.123131,
0.123112)), class = "data.frame", row.names = c(NA, -6L))
Then just use xtabs
:
Letter <- substr(df$Name, 1, 1)
xtabs(Count~Letter, df)
# Letter
# J M R W
# 24679 22224 12300 12121
Or if you want it vertically organized as a data.frame
:
data.frame(xtabs(Count~Letter, df))
# Letter Freq
# 1 J 24679
# 2 M 22224
# 3 R 12300
# 4 W 12121
Upvotes: 2
Reputation: 26236
using baseR aggregate
aggregate(Count ~ substr(Name, 1, 1), df, sum)
substr(Name, 1, 1) Count
1 J 24679
2 M 22224
3 R 12300
4 W 12121
Upvotes: 1