Reputation: 71
replace value of a column with another column I am working on a pretreatment process of a data set. Here is an over view of my dataset
BSTN ASTN
150 216
300 222
4310 35
60 4310
150 23
4310 60
. .
. .
. .
The numbers BSTN ASTN indicate code number of stations I have a another data set of BSTN,ASTN numbers that show station numbers. Here is what it looks like
Station code Station
150 A
300 A
4310 B
23 C
60 C
. .
. .
. .
The problem is that some stations have two or more codes. for example station code for station A is 150, 300
I would like to unify the stations numbers in BSTN and ASTN which have multiple station numbers into one unified station number. I would also like to unify the numbers of the code in BSTN, ASTN into a number with more frequency
BSTN ASTN
150 216
150 222
4310 35
60 4310
150 60
4310 60
. .
. .
. .
The final output I want would look like below.
I would appreciate some help
Upvotes: 0
Views: 58
Reputation: 2143
Let's get in your data
library(readr)
stations <- read_table(
"BSTN ASTN
150 216
300 222
4310 35
60 4310
150 23
4310 60")
stCodex <- read_table(
"Station_code Station
150 A
300 A
4310 B
23 C
60 C "
)
Let's put both columns into a single tibble, so we can find out what's the most popular Station_code for each Station
st2 <- tibble(replaceBy = c(stations$BSTN, stations$ASTN)) %>%
left_join(stCodex, by = c("replaceBy" = "Station_code")) %>%
filter(!is.na(Station)) %>%
group_by(Station, replaceBy) %>%
summarise(Count = n()) %>%
# We have a count of each Station's Station_codes. Find the most frequent
mutate(BestCount = max(Count)) %>%
filter(Count == BestCount) %>%
summarise_all(~ first(.)) %>%
arrange(Station) %>%
select(Station, replaceBy)
That looks like:
# A tibble: 3 x 2
Station replaceBy
<chr> <dbl>
1 A 150
2 B 4310
3 C 60
Build a replacement table - to replace the low-freq Station_codes with higher ones.
rplc <- stCodex %>%
left_join(st2, by = "Station") %>%
filter(Station_code != replaceBy) %>%
select(- Station)
Giving
# A tibble: 2 x 2
Station_code replaceBy
<dbl> <dbl>
1 300 150
2 23 60
Do the replacements - first in BSTN, then in ASTN
stations %>%
left_join(rplc, by = c(BSTN ="Station_code")) %>%
mutate(BSTN = ifelse(is.na(replaceBy), BSTN, replaceBy)) %>%
select(- replaceBy) %>%
left_join(rplc, by = c(ASTN ="Station_code")) %>%
mutate(ASTN = ifelse(is.na(replaceBy), ASTN, replaceBy)) %>%
select(- replaceBy)
The answer is
# A tibble: 6 x 2
BSTN ASTN
<dbl> <dbl>
1 150 216
2 150 222
3 4310 35
4 60 4310
5 150 60
6 4310 60
Upvotes: 1