Evan
Evan

Reputation: 1499

How to tally up a new column in a dataframe based on values in a separate dataframe?

I have a df1 with some repeating ZIP codes:

    ZIP Weekend Sales
1 60657       1    1
2 60657       1    1
3 60657       1    1
4 60641       0   NA
5 60641       0   NA
6 60607       0   NA
7 60607       0    1
8 60628       0    1

I have an additional df2 with 60 unique ZIP codes:

    ZCTA   ZIP Population
6  60657 60657      82739
7  60609 60609      64906
10 60612 60612      33472
13 60641 60641     40603
14 60616 60616      48433
15 60607 60607      84155
16 60628 60628      92084
17 60619 60619      63825

I would like to make a new column of df2 titled Sites that is a tally of the total number of times the ZIP code repeats in df1. The output would be:

    ZCTA   ZIP Population  Sites
6  60657 60657      82739    3
7  60609 60609      64906    0
10 60641 60612      33472    2
13 60607 60615      40603    2
14 60616 60616      48433    0
15 60617 60617      84155    0
16 60628 60628      92084    1
17 60619 60619      63825    0

Upvotes: 0

Views: 69

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388817

You can count and then join.

library(dplyr)

df1 %>%
  count(ZIP, name = "Sites") %>%
  right_join(df2, by = 'ZIP') %>%
  mutate(Sites = replace(Sites, is.na(Sites), 0))

# A tibble: 8 x 4
#    ZIP Sites   ZCTA Population
#  <int> <dbl> <int>      <int>
#1 60657     3 60657      82739
#2 60609     0 60609      64906
#3 60612     0 60612      33472
#4 60641     2 60641      40603
#5 60616     0 60616      48433
#6 60607     2 60607      84155
#7 60628     1 60628      92084
#8 60619     0 60619      63825

In base R, that would be :

merge(df2, aggregate(Weekend~ZIP, df1, length), all.x = TRUE)

and then replace NA by 0 as usual.

Upvotes: 0

r2evans
r2evans

Reputation: 160407

library(dplyr)
df1 %>%
  group_by(ZIP) %>%
  tally(name = "Sites") %>%
  right_join(df2, by = "ZIP") %>%
  mutate(Sites = if_else(is.na(Sites), 0L, Sites))
# # A tibble: 8 x 4
#     ZIP Sites  ZCTA Population
#   <int> <int> <int>      <int>
# 1 60657     3 60657      82739
# 2 60609     0 60609      64906
# 3 60612     0 60612      33472
# 4 60641     2 60641      40603
# 5 60616     0 60616      48433
# 6 60607     2 60607      84155
# 7 60628     1 60628      92084
# 8 60619     0 60619      63825

Data:

df1 <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
  ZIP Weekend Sales
60657       1    1
60657       1    1
60657       1    1
60641       0   NA
60641       0   NA
60607       0   NA
60607       0    1
60628       0    1")

df2 <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
 ZCTA   ZIP Population
60657 60657      82739
60609 60609      64906
60612 60612      33472
60641 60641     40603
60616 60616      48433
60607 60607      84155
60628 60628      92084
60619 60619      63825")

Upvotes: 2

Related Questions