Sarah Ashley
Sarah Ashley

Reputation: 23

Look up data frame with values stored in another data frame

I have 15 data frames containing information about patient visits for a group of patients. Example below. They are named as FA.OFC1, FA.OFC2 etc.

ID       sex   date        age.yrs   important.var   etc...
xx_111   F     xx.xx.xxxx  x.x       x

I am generating a summary data frame (sev.scores) which contains information about the most severe episode a patient has across all recorded data. I have successfully used the which.max function to get the most severe episode but now need additional information about that particular episode.

I recreated the name of the data frame I will need to look up to get the additional information by pasting information after the max return:

max        data frame
8          df2

Specifically the names() function gave me the name of the column with the most severe episode (in the summary data frame sev.scores which also gives me information about which data frame to look up:

sev.scores[52:53] <- as.data.frame(cbind(row.names(sev.scores[c(5,8,11,14,17,20,23,26,29,32,35,38,41,44,47,50)]),apply(sev.scores[c(5,8,11,14,17,20,23,26,29,32,35,38,41,44,47,50)],1,function(x) names(sev.scores[c(5,8,11,14,17,20,23,26,29,32,35,38,41,44,47,50)])[which(x==max(x))])))

However now I would like to figure out how to tell R to take the data frame name stored in the column and search that data frame for the entry in the 5th column.

So in the example above the information about the most severe episode is stored in data frame 2 (df2) and I need to take information from the 5th record (important.var) and return it to this summary data frame.

UPDATE

I have now stored these dfs in a list but am still having some trouble getting the information I would like.

I found the following example for getting the max value from a list

lapply(L1, function(x) x[which.max(abs(x))])

How can I adapt this for a factor which is present in all elements of the list?

e.g. something like:

lapply(my_dfs[[all elements]]["factor of interest"], function(x) x[which.max(abs(x))])

Upvotes: 2

Views: 107

Answers (1)

Konrad Rudolph
Konrad Rudolph

Reputation: 546053

If I may suggest a fundamentally different approach: concatenate all your data.frames into one (rbind), and add a separate column that describes the nature of the original data.frame. For this, it’s necessary to know in which regard the original data.frames differed (e.g. by disease type; since I don’t know your data, let’s stick with this for my example).

Furthermore, you need to ensure that your data is in tidy data format. This is an easy requirement to satisfy, because your data should be in this format anyway!

Then, once you have all the data in a single data.frame, you can create a summary trivially by simply selecting the most severe episode for each disease type:

sev_scores = all_data %>%
    group_by(ID) %>%
    filter(row_number() == which.max(FactorOfInterest))

Note that this code uses the ‹dplyr› package. You can perform an equivalent analysis using different packages (e.g. ‹data.table›) or base R functions, but I strongly recommend dplyr: The resulting code is generally easier to understand.

Rather than your sev.scores table, which has columns referring to rows and data.frame names, the sev_scores I created above will contain the actual data for the most severe episode for each patient ID.

Upvotes: 1

Related Questions