Reputation: 8183
I have a data frame users
with a column id
and country
id country
1 France
2 United States
3 France
I want to add a new column salary
which depends on the average salary
for a given country
.
My first thought was to create a config vector with (country, salary)
like this :
salary_country <- c(
"France"=45000,
"United States"=50000,
...)
And then to create the column like this (using dplyr
) :
tbl_df(users) %>%
mutate(salary = ifelse(country %in% names(salary_country),
salary_country[country],
0))
It runs like a charm. If the country does not exist in my salary_country
vector, the salary
is equal to 0 else it's equal to the given salary
.
But, it is quite slow on a very large data frame and quite verbose.
Is there a better way to accomplish that ?
Upvotes: 0
Views: 54
Reputation: 31161
You can use match
:
salary_country[match(users$country, names(salary_country))]
Or go for data.table
:
dt = data.table(salary=salary_country, country=names(salary_country))
dt[setDT(users), on='country']
# salary country id
#1: 45000 France 1
#2: 50000 United States 2
#3: 45000 France 3
Upvotes: 1