the_t_test_1
the_t_test_1

Reputation: 1263

Basic geographically weighted regression

I am familiar with QGIS but struggling with R here, and I'd like some help to do a basic geographically weighted regression with some data that is based on the centroid points of New York City's PUMA shapefile (55 points, one for each PUMA, which is basically like a big census tract).

This is my data in csv: https://www.sendspace.com/file/pj48b5

Also, if necessary (probably not), here is the shapefile data: https://www.sendspace.com/file/wbqrpb

As you can see, the table is in the following format:

lat         lng         variable_a  2015_median 9_yr_change 9_yr_change_new pc_change
40.8912378  -73.9101365 6           1200        380         480             31.6666666667
40.8901905  -73.8614272 8           1100        280         200             25.4545454545
40.8502191  -73.8050669 11          1100        300         530             27.2727272727
40.8561725  -73.8525618 2           1100        320         205             29.0909090909

If I do the basic regression of two variables, as so:

fit <- lm(variable_a ~ X9_yr_change_new, data=s_data)
summary(fit)

Then I get an R squared of 0.42

What I want to do next is to test the same two variables, but by using the lat and lng variables (co-ordinates of the centroids) to see if there's a stronger relationship when the geographical proximity of these points is taken into account.

Can anyone tell me the easiest way to do this either in QGIS or in R?

Upvotes: 0

Views: 887

Answers (1)

Sathish
Sathish

Reputation: 12723

For weighted regression, you have to first find the weights based on location. It can be done by averaging the variable_a response for every group of lat/lng, and count the number of responses in each group. This number will become the weights for the average response of ave_var_a. Then conduct weighted regression by passing weights = number to the lm function.

Since your data has only one response per location, the fitted results of both unweighted and weighted regression are same. It can be seen using summary.aov() function.

I am showing both unweighted and weighted regression below.

Setting up weighted data:

df1 <- read.table(file = 's_data.csv', header = TRUE, sep = ',', stringsAsFactors = FALSE )
head(df1)
#        lat       lng variable_a X2015_median X9_yr_change X9_yr_change_new pc_change
# 1 40.89124 -73.91014          6         1200          380              480  31.66667
# 2 40.89019 -73.86143          8         1100          280              200  25.45455
# 3 40.85022 -73.80507         11         1100          300              530  27.27273
# 4 40.85617 -73.85256          2         1100          320              205  29.09091
# 5 40.84518 -73.88736         21          850          260              250  30.58824
# 6 40.86465 -73.90325          2         1000          230              300  23.00000

library(data.table)
setDT(df1)

df1[, 
    j = `:=` (number   = .N,   # total number of responses per location
              ave_var_a = mean(variable_a)),  # average response per location
    by = c('lat', 'lng')]

head(df1)
#         lat       lng variable_a X2015_median X9_yr_change X9_yr_change_new pc_change number ave_var_a
# 1: 40.89124 -73.91014          6         1200          380              480  31.66667      1         6
# 2: 40.89019 -73.86143          8         1100          280              200  25.45455      1         8
# 3: 40.85022 -73.80507         11         1100          300              530  27.27273      1        11
# 4: 40.85617 -73.85256          2         1100          320              205  29.09091      1         2
# 5: 40.84518 -73.88736         21          850          260              250  30.58824      1        21
# 6: 40.86465 -73.90325          2         1000          230              300  23.00000      1         2

Perform lm unweighted and weighted regression:

# unweighted regression
fit <- lm( variable_a ~ X9_yr_change_new, data= df1 )
summary.aov(fit)
#                    Df  Sum Sq Mean Sq F value   Pr(>F)    
#   X9_yr_change_new  1 6537830 6537830   39.23 6.89e-08 ***
#   Residuals        53 8833855  166677                     
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# weighted regression
weighted_fit <- lm( ave_var_a ~ X9_yr_change_new, data= df1, weights = number )
summary.aov(weighted_fit)
#                    Df  Sum Sq Mean Sq F value   Pr(>F)    
#   X9_yr_change_new  1 6537830 6537830   39.23 6.89e-08 ***
#   Residuals        53 8833855  166677                     
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Upvotes: 2

Related Questions