Reputation: 103
I know it should be an easier or smarter way of doing what I need, but I haven't found it yet after several days.
I have 2 dataframes that I need to merge using a extra condition. For example:
df1 <- data.frame(Username = c("user1", "user2", "user3", "user4", "user5", "user6"))
df2 <- data.frame(File_Name = c(rep("StudyABC", 5), rep("AnotherStudyCDE", 4)), Username = c("user1", rep(c("user2", "user3", "user4", "user5"),2)))
print(df1)
print(df2)
What I need is to create 2 new columns in df1 called ABC and CDE that includes their "File_Name" values. Of course the real data is hundreds of lines and not ordered so no way of selecting by range.
One of the solutions (not elegant) that I have found is:
df2_filtered <- df2 %>% filter(str_detect(File_Name, "ABC"))
df1 <- left_join(df1, df2_filtered, by = "Username")
names(df1)[2] <- "ABC"
df2_filtered <- df2 %>% filter(str_detect(File_Name, "CDE"))
df1 <- left_join(df1, df2_filtered, by = "Username")
names(df1)[3] <- "CDE"
print(df1)
Is there a shortest way of doing it? Because I have to repeat the same logic 160 times.
Thanks
Upvotes: 1
Views: 56
Reputation: 388797
You can extract either "ABC"
or "CDE"
from File_Name
and cast the data into wide format. We can join the data with df1
to get all the Username
in the final dataframe.
library(dplyr)
df2 %>%
mutate(name = stringr::str_extract(File_Name, 'ABC|CDE')) %>%
tidyr::pivot_wider(names_from = name, values_from = File_Name) %>%
right_join(df1, by = 'Username')
# Username ABC CDE
# <chr> <chr> <chr>
#1 user1 StudyABC NA
#2 user2 StudyABC AnotherStudyCDE
#3 user3 StudyABC AnotherStudyCDE
#4 user4 StudyABC AnotherStudyCDE
#5 user5 StudyABC AnotherStudyCDE
#6 user6 NA NA
Upvotes: 1
Reputation: 767
If one study can have more than one file path (which I assume is the case from your previous attempts), just converting your data to a wide format before joining won't work as you'll have one column per file path, not per study.
One method in this case could be to use a for-loop to create an additional column in df2 with the study name, then convert the data to a wide format using pivot_wider
.
It's not a very R method though so I'd welcome suggestions to avoid creating the empty study
column and the for-loop
studies <- c("ABC", "CDE")
#create empty column named "study"
df2 <- df2 %>%
mutate(study = NA_character_)
for (i in studies) {
df2 <- df2 %>%
mutate(study = if_else(grepl(i, File_Name), i, study))
}
df2 <- df2 %>%
pivot_wider(names_from = study, values_from = File_Name)
> df2
# A tibble: 5 x 3
Username ABC CDE
<chr> <chr> <chr>
1 user1 StudyABC NA
2 user2 StudyABC AnotherStudyCDE
3 user3 StudyABC AnotherStudyCDE
4 user4 StudyABC AnotherStudyCDE
5 user5 StudyABC AnotherStudyCDE
df2
is now in a wide format and you can join it to df1
as before to get your desired output.
df3 <- left_join(df1, df2)
Upvotes: 1
Reputation: 2717
What you're looking for is a way of casting data from long to wide eg using data.table package I would do this:
library(data.table)
# converts data.frame to data.table
dt <- as.data.table(df2)
# I copy the file_name so one is used for the pivotting for long to wide and the other is used for filling in the data
dt[, study := File_Name]
dt_wide <- dcast(Username~File_Name, data=dt, value.var = "study")
# have a look at df2 in wide format
dt_wide[]
# now its just a direct merge to pull it back in to df1 and turn
# back in to data.frame for you
out <- merge(as.data.table(df1), dt_wide, by="Username", all.x=TRUE)
setDF(out)
out
Plenty of tutorials on melting/casting even without data.table. It's just knowing what to search for eg Google throws up https://ademos.people.uic.edu/Chapter8.html as the first result.
Upvotes: 1