nicshah
nicshah

Reputation: 345

String as formula

I've tried searching through the forums and unable to find help. I'm quite new to R, and am having limited success in loading in some strings to be used as a formula.

I have a csv with the following format:

            Sam, Frank, Dennis, Steven, Robert, Georgia
Region_1    218,    763,    811,    812,    797,    574
Region_2    474,    983,    343,    697,    310,    335
Region_3    700,    813,    133,    212,    906,    680
Region_4    212,    581,    893,    514,    530,    795

And am able to load this in and add additional columns to my data frame that adds specific columns (ie. Sam + Frank, Dennis + Georgia).

I can do it if I hard code the teams into my code, eg:

temp <- mutate(temp, team_1 = Sam + Robert) 

My issue is that I have another csv that contains a list of 2 columns, and these are teams that changes daily, eg:

Team 1  Sam + Robert
Team 2  Frank + Georgia
Team 3  Frank + Steven
Team 4  Robert + Dennis
Team 5  Frank + Sam

Is there a way that I can load this csv list into R and add these team member columns together.

But I can't see to add this list in from the csv file and have a new data frame with the total of team members dynamically.

Is anyone able to help me with this?

I'm looking to end up with something like this (with the sum of the team members (team 1, team 2, etc from the second csv file) as additional columns:

         Sam,   Frank,  Dennis, Steven, Robert, Georgia,    Sam + Robert,   Frank + Georgia,    Frank + Steven, Robert + Dennis,    Frank + Sam
 Region_1   218,    763,    811,    812,    797,    574,    981,    1337,   1575,   1608,   981
 Region_2   474,    983,    343,    697,    310,    335,    1457,   1318,   1680,   653,    1457
 Region_3   700,    813,    133,    212,    906,    680,    1513,   1493,   1025,   1039,   1513
 Region_4   212,    581,    893,    514,    530,    795,    793,    1376,   1095,   1423,   793

With much thanks.

Upvotes: 8

Views: 304

Answers (4)

mt1022
mt1022

Reputation: 17299

(Thank @alistaire for coding the data in a handy format)

This can be done easily with base R functions (eval + parse) in a simple for-loop:

for(i in teams$members){
    people[[i]] <- eval(parse(text = i), envir = people)
}

# > people
#     region Sam Frank Dennis Steven Robert Georgia Sam + Robert Frank + Georgia
# 1 Region_1 218   763    811    812    797     574         1015            1337
# 2 Region_2 474   983    343    697    310     335          784            1318
# 3 Region_3 700   813    133    212    906     680         1606            1493
# 4 Region_4 212   581    893    514    530     795          742            1376
# Frank + Steven Robert + Dennis Frank + Sam
# 1           1575            1608         981
# 2           1680             653        1457
# 3           1025            1039        1513
# 4           1095            1423         793

Upvotes: 2

arvi1000
arvi1000

Reputation: 9592

Background: You can access data.frame columns stored in vector x with people[, x], and you can use strsplit to get a vector of the columns you need.

Answer:

# function to get the sum of two columns from 'people'
get_sum <- function(m) {
  members <- unlist(strsplit(m, ' + ', fixed = T))
  rowSums(people[, members])
}

# "Sam + Robert"
get_sum(teams$members[1])
# [1] 1015  784 1606  742

# apply this for every team, and add results to 'people'
cbind(people,
      sapply(teams$members, get_sum))

#     region Sam Frank Dennis Steven Robert Georgia Sam + Robert Frank + Georgia Frank + Steven Robert + Dennis Frank + Sam
# 1 Region_1 218   763    811    812    797     574         1015            1337           1575            1608         981
# 2 Region_2 474   983    343    697    310     335          784            1318           1680             653        1457
# 3 Region_3 700   813    133    212    906     680         1606            1493           1025            1039        1513
# 4 Region_4 212   581    893    514    530     795          742            1376           1095            1423         793

Upvotes: 2

Maurits Evers
Maurits Evers

Reputation: 50738

Here is an option using rlang::parse_expr to directly parse the expression from the second data.frame

df1 %>% bind_cols(map_dfc(df2$V2, ~df1 %>%
    transmute(!!as.character(.x) := !!rlang::parse_expr(as.character(.x)))))
#  Sam Frank Dennis Steven Robert Georgia Sam + Robert Frank + Georgia
#1 218   763    811    812    797     574         1015            1337
#2 474   983    343    697    310     335          784            1318
#3 700   813    133    212    906     680         1606            1493
#4 212   581    893    514    530     795          742            1376
#  Frank + Steven Robert + Dennis Frank + Sam
#1           1575            1608         981
#2           1680             653        1457
#3           1025            1039        1513
#4           1095            1423         793

Or you can name the new columns according to the team:

df1 %>% bind_cols(map2_dfc(df2$V1, df2$V2, ~df1 %>%
    transmute(!!as.character(.x) := !!rlang::parse_expr(as.character(.y)))))
#  Sam Frank Dennis Steven Robert Georgia Team 1 Team 2 Team 3 Team 4 Team 5
#1 218   763    811    812    797     574   1015   1337   1575   1608    981
#2 474   983    343    697    310     335    784   1318   1680    653   1457
#3 700   813    133    212    906     680   1606   1493   1025   1039   1513
#4 212   581    893    514    530     795    742   1376   1095   1423    793

The idea is that we use map (or map2) to apply the "formula" stored in df2 within a transmute context of df1. The expression gets evaluated with !!parse_expr(...).


Sample data

df1 <- read.table(text =
    "Sam Frank Dennis Steven Robert Georgia
Region_1    218    763    811    812    797    574
Region_2    474    983    343    697    310    335
Region_3    700    813    133    212    906    680
Region_4    212    581    893    514    530    795", header = T)


df2 <- read.table(text =
    "'Team 1'  'Sam + Robert'
'Team 2'  'Frank + Georgia'
'Team 3'  'Frank + Steven'
'Team 4'  'Robert + Dennis'
'Team 5'  'Frank + Sam'", header = F)

Upvotes: 3

alistaire
alistaire

Reputation: 43354

Working with code as strings is sort of complicated. Here, it's simpler to turn the second data frame into a lookup table which can be joined to the first reshaped to long form. The result can then be summarized and reshaped into whatever form you'd like.

library(tidyverse)

people <- data.frame(
    region = c("Region_1", "Region_2", "Region_3", "Region_4"),
    Sam = c(218L, 474L, 700L, 212L),
    Frank = c(763L, 983L, 813L, 581L),
    Dennis = c(811L, 343L, 133L, 893L),
    Steven = c(812L, 697L, 212L, 514L),
    Robert = c(797L, 310L, 906L, 530L),
    Georgia = c(574L, 335L, 680L, 795L), 
    stringsAsFactors = FALSE
)

teams <- data.frame(
    team = c("Team 1", "Team 2", "Team 3", "Team 4", "Team 5"),
    members = c("Sam + Robert", "Frank + Georgia", "Frank + Steven", "Robert + Dennis", "Frank + Sam"), 
    stringsAsFactors = FALSE
)

teams_long <- teams %>% 
    mutate(team = members) %>%    # set team name
    separate_rows(members)    # separate teams to names

people %>% 
    gather(member, value, -region) %>%    # reshape to long form
    left_join(teams_long, c('member' = 'members')) %>%    # join by names
    group_by(region, team) %>% 
    summarise(value = sum(value)) %>%    # aggregate
    spread(team, value) %>%    # reshape to wide form
    left_join(people, .)    # join back to original data
#>     region Sam Frank Dennis Steven Robert Georgia Frank + Georgia
#> 1 Region_1 218   763    811    812    797     574            1337
#> 2 Region_2 474   983    343    697    310     335            1318
#> 3 Region_3 700   813    133    212    906     680            1493
#> 4 Region_4 212   581    893    514    530     795            1376
#>   Frank + Sam Frank + Steven Robert + Dennis Sam + Robert
#> 1         981           1575            1608         1015
#> 2        1457           1680             653          784
#> 3        1513           1025            1039         1606
#> 4         793           1095            1423          742

Upvotes: 4

Related Questions