deathcon501
deathcon501

Reputation: 13

Best way to find weighted averages from two dataframes in R?

new to R so sorry if this is a bit broad but I'm not really even sure where to start with an approach to this problem.

I have two dataframes, df1 containing demographic data from certain Census tracts.

           AfricanAmerican AsianAmerican    Hispanic    White
Tract1                0.25          0.25        0.25     0.25
Tract2                0.50          0.10        0.20     0.10  
Tract3                0.05          0.10        0.35     0.50

And df2 contains observations polygons with the percentage of each census tract that makes up its area.

           Poly1      Poly2    Poly3    
Tract1      0.33       0.25     0.00     
Tract2      0.33       0.25     0.10      
Tract3      0.34       0.50     0.90    

What I want to do is get the weighted averages of the demographic data in each observation polygon

          AfricanAmerican AsianAmerican    Hispanic    White
Poly1                0.26          0.15        0.27     0.29
Poly2                0.21          0.14        0.29     0.34  
Poly3                0.10          0.10        0.34     0.46

So far I'm thinking I could do something like

sum(df1$AfricanAmerican * df2$Poly1) 

Then use a for loop to iterate over all demographic variables for one polygon, then nest that in another for loop to iterate over all polygons, but given that I have hundreds of Census tracts and polygons in my working dataset, I was wondering if there was a better approach?

Upvotes: 0

Views: 55

Answers (2)

jay.sf
jay.sf

Reputation: 73802

Use colSums of the products in mapply.

t(mapply(function(...) colSums(`*`(...)), list(df1), df2))
#      AfricanAmerican AsianAmerican Hispanic  White
# [1,]          0.2645        0.1495   0.2675 0.2855
# [2,]          0.2125        0.1375   0.2875 0.3375
# [3,]          0.0950        0.1000   0.3350 0.4600

If you want to round to two digits, just wrap round(..., 2) around it.


Data:

df1 <- read.table(header=T, text='
                           AfricanAmerican AsianAmerican    Hispanic    White
     Tract1                0.25          0.25        0.25     0.25
     Tract2                0.50          0.10        0.20     0.10  
     Tract3                0.05          0.10        0.35     0.50
')
df2 <- read.table(header=T, text='
                Poly1      Poly2    Poly3    
     Tract1      0.33       0.25     0.00     
     Tract2      0.33       0.25     0.10      
     Tract3      0.34       0.50     0.90  
')

Upvotes: 0

Vin&#237;cius F&#233;lix
Vin&#237;cius F&#233;lix

Reputation: 8836

Libraries

library(tidyverse)

Sample Data

df1 <-
  tibble(
    Trat = paste0("Trat",1:3),
    AfricanAmerican = c(.25,.5,.05),
    AsianAmerican = c(.25,.1,.1),
    Hispanic = c(.25,.2,.35)
  )

df2 <-
  tibble(
    Trat = paste0("Trat",1:3),
    Poly1 = c(.33,.33,.34),
    Poly2 = c(.25,.25,.5),
    Poly3 = c(0,.1,.9)
  ) %>% 
  #Pivot df2 making a single column for all Poly values
  pivot_longer(cols = -Trat,names_to = "Poly")

Code

df1 %>% 
  #Join df1 and df2 by Trat
  left_join(df2) %>% 
  #Grouping by Poly
  group_by(Poly) %>%
  #Sum values across variables AfricanAmerican to Hispanic
  summarise(across(AfricanAmerican:Hispanic,function(x)sum(x*value)))

Output

Joining, by = "Trat"
# A tibble: 3 x 4
  Poly  AfricanAmerican AsianAmerican Hispanic
  <chr>           <dbl>         <dbl>    <dbl>
1 Poly1           0.264         0.150    0.268
2 Poly2           0.212         0.138    0.288
3 Poly3           0.095         0.1      0.335

Upvotes: 0

Related Questions