I have a large data frame looking like this. Name Gender Count Probability James M 12345 0.012313 John M 12334 0.012314 Robert M 12300 0.0124 Michael M 12222 0.13 William M 12121 0.123131 Mary F 10002 0.123112 I was wondering how i can get the sum of the counts of all of the names that start with each letter of the alphabet in the dataframe and create a new dataframe that looks like Name Count A 22222 B 44444 C 66666

Reputation: 23

How to find the sum of a column based on another column and put it into a new dataframe In R

I have a large data frame looking like this.

Name	Gender	Count	Probability
James	M	12345	0.012313
John	M	12334	0.012314
Robert	M	12300	0.0124
Michael	M	12222	0.13
William	M	12121	0.123131
Mary	F	10002	0.123112

I was wondering how i can get the sum of the counts of all of the names that start with each letter of the alphabet in the dataframe and create a new dataframe that looks like

Name Count
A    22222
B    44444
C    66666

Upvotes: 1

Answers (4)

Anoushiravan R

Reputation: 21938

I think you can use this solution:

library(dplyr)
library(stringr)

df %>%
  mutate(initial = str_sub(Name, 1L, 1L)) %>%
  group_by(initial) %>%
  summarise(count = sum(Count, na.rm = TRUE)) %>%
  as.data.frame()

  initial count
1       J 24679
2       M 22224
3       R 12300
4       W 12121

Data

df <- tribble(
  ~Name,    ~Gender,    ~Count,     ~Probability,
  "James",  "M",    12345,  0.012313,
  "John",   "M",    12334,  0.012314,
  "Robert",     "M",    12300,  0.0124,
  "Michael",    "M",    12222,  0.13,
  "William",    "M",    12121,  0.123131,
  "Mary",   "F",    10002,  0.123112
)

Upvotes: 2

Ronak Shah

Reputation: 389355

You can use tapply in base R :

with(df, tapply(Count, substr(Name, 1, 1), sum))

Or if we you want output as dataframe.

stack(with(df, tapply(Count, substr(Name, 1, 1), sum)))

Upvotes: 1

dcarlson

Reputation: 11076

It is better to provide data with dput:

df <- structure(list(Name = c("James", "John", "Robert", "Michael", 
"William", "Mary"), Gender = c("M", "M", "M", "M", "M", "F"), 
    Count = c(12345L, 12334L, 12300L, 12222L, 12121L, 10002L), 
    Probability = c(0.012313, 0.012314, 0.0124, 0.13, 0.123131, 
    0.123112)), class = "data.frame", row.names = c(NA, -6L))

Then just use xtabs:

Letter <- substr(df$Name, 1, 1)    
xtabs(Count~Letter, df)
# Letter
#     J     M     R     W 
# 24679 22224 12300 12121

Or if you want it vertically organized as a data.frame:

data.frame(xtabs(Count~Letter, df))
#   Letter  Freq
# 1      J 24679
# 2      M 22224
# 3      R 12300
# 4      W 12121

Upvotes: 2

AnilGoyal

Reputation: 26236

using baseR aggregate

aggregate(Count ~ substr(Name, 1, 1), df, sum)

  substr(Name, 1, 1) Count
1                  J 24679
2                  M 22224
3                  R 12300
4                  W 12121

Upvotes: 1

How to find the sum of a column based on another column and put it into a new dataframe In R

Answers (4)

Related Questions