bLanton70
bLanton70

Reputation: 17

How to match data with two df in double loop in R?

I am having trouble matching data by the following key:

The first data frame contains the columns ID and Distance. As below.

ID Distance
A 30
B 50
C 300
D 120
E 50

The second data frame contains numbers that change in years (Year) and are different from the data included in Distance column of the first data frame. As below.

Year 30 50 120 300
2020 0.12 0.20 0.31 0.45
2021 0.14 0.23 0.33 0.46
2022 0.16 0.26 0.35 0.47
2023 0.18 0.29 0.37 0.48
2024 0.20 0.32 0.39 0.49

My question is. How do effectively assign these data in the third empty data frame, which contains different IDs characterized by different Distance and years for which the rate from Distance is varying. The third final df with examples of expected data is above. I am tryfing to use double loop but i am faild. Basically I am newbie in R and programming.

Year A B C D E
2020 ex. 0.20
2021 ex. 0.33
2022 ex. 0,26
2023 ex. 0.18
2024 ex. 0.39

Many thanks for any suggestions and solutions.

Upvotes: 0

Views: 66

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145745

If all of the DF1$Distance values are present in the DF2 column names (and vice versa), then this should work:

# select DF2 columns by the DF1 distance values
result = DF2[c("Year", as.character(DF1$Distance))] 
# change the names to the ID values
names(result)[-1] = DF1$ID)

In the small dput() sample, however, there are some Distance values that are not columns in DF2. Just in case this could happen in your real data, here is a slightly more complicated solution where the matching columns are created in the result and the missing columns are added as NA values:

DF1$in_df2 = DF1$Distance %in% names(DF2)

result = DF2[c("Year", as.character(DF1$Distance[DF1$in_df2]))]
names(result)[-1] =  DF1$ID[DF1$in_df2]
result[DF1$ID[!DF1$in_df2]] = NA
result
#   Year    A    B    E  C  D
# 1 2017 0.40 0.54 0.54 NA NA
# 2 2018 0.42 0.56 0.56 NA NA
# 3 2019 0.44 0.58 0.58 NA NA
# 4 2020 0.46 0.60 0.60 NA NA
# 5 2021 0.48 0.62 0.62 NA NA

Using this sample data:

DF1 = read.table(text = 'ID     Distance
A   30
B   50
C   300
D   120
E   50', header = T)

DF2 = structure(list(Year = c(2017, 2018, 2019, 2020, 2021), `20` = c(0.26,
 0.28, 0.30, 0.32, 0.34), `30` = c(0.4, 0.42, 0.44, 0.46, 0.48), `50` = 
c(0.54, 0.56, 0.58, 0.6, 0.62), `70` = c(0.67, 0.68, 0.69, 0.70, 0.71 )), 
row.names = c(NA, 5L), class = "data.frame")

Upvotes: 0

Related Questions