Reputation: 17
I am having trouble matching data by the following key:
The first data frame contains the columns ID and Distance. As below.
ID | Distance |
---|---|
A | 30 |
B | 50 |
C | 300 |
D | 120 |
E | 50 |
The second data frame contains numbers that change in years (Year) and are different from the data included in Distance column of the first data frame. As below.
Year | 30 | 50 | 120 | 300 |
---|---|---|---|---|
2020 | 0.12 | 0.20 | 0.31 | 0.45 |
2021 | 0.14 | 0.23 | 0.33 | 0.46 |
2022 | 0.16 | 0.26 | 0.35 | 0.47 |
2023 | 0.18 | 0.29 | 0.37 | 0.48 |
2024 | 0.20 | 0.32 | 0.39 | 0.49 |
My question is. How do effectively assign these data in the third empty data frame, which contains different IDs characterized by different Distance and years for which the rate from Distance is varying. The third final df with examples of expected data is above. I am tryfing to use double loop but i am faild. Basically I am newbie in R and programming.
Year | A | B | C | D | E |
---|---|---|---|---|---|
2020 | ex. 0.20 | ||||
2021 | ex. 0.33 | ||||
2022 | ex. 0,26 | ||||
2023 | ex. 0.18 | ||||
2024 | ex. 0.39 |
Many thanks for any suggestions and solutions.
Upvotes: 0
Views: 66
Reputation: 145745
If all of the DF1$Distance
values are present in the DF2
column names (and vice versa), then this should work:
# select DF2 columns by the DF1 distance values
result = DF2[c("Year", as.character(DF1$Distance))]
# change the names to the ID values
names(result)[-1] = DF1$ID)
In the small dput()
sample, however, there are some Distance
values that are not columns in DF2
. Just in case this could happen in your real data, here is a slightly more complicated solution where the matching columns are created in the result and the missing columns are added as NA
values:
DF1$in_df2 = DF1$Distance %in% names(DF2)
result = DF2[c("Year", as.character(DF1$Distance[DF1$in_df2]))]
names(result)[-1] = DF1$ID[DF1$in_df2]
result[DF1$ID[!DF1$in_df2]] = NA
result
# Year A B E C D
# 1 2017 0.40 0.54 0.54 NA NA
# 2 2018 0.42 0.56 0.56 NA NA
# 3 2019 0.44 0.58 0.58 NA NA
# 4 2020 0.46 0.60 0.60 NA NA
# 5 2021 0.48 0.62 0.62 NA NA
Using this sample data:
DF1 = read.table(text = 'ID Distance
A 30
B 50
C 300
D 120
E 50', header = T)
DF2 = structure(list(Year = c(2017, 2018, 2019, 2020, 2021), `20` = c(0.26,
0.28, 0.30, 0.32, 0.34), `30` = c(0.4, 0.42, 0.44, 0.46, 0.48), `50` =
c(0.54, 0.56, 0.58, 0.6, 0.62), `70` = c(0.67, 0.68, 0.69, 0.70, 0.71 )),
row.names = c(NA, 5L), class = "data.frame")
Upvotes: 0