Reputation: 99
I tried looking for a similar question but couldn't find one. If you do please let me know!
I've been working on a project looking at grain staples
Here is a subset of my dataset:
nutrient.component. grain nutrients
1 Beta-carotene (μg) White Rice 0.00
2 Beta-carotene (μg) Brown Rice NA
3 Calcium (mg) White Rice 28.00
4 Calcium (mg) Brown Rice 23.00
5 Carbohydrates (g) White Rice 80.00
6 Carbohydrates (g) Brown Rice 77.00
7 Copper (mg) White Rice 0.22
8 Copper (mg) Brown Rice NA
9 Energy (kJ) White Rice 1528.00
10 Energy (kJ) Brown Rice 1549.00
11 Fat (g) White Rice 0.66
12 Fat (g) Brown Rice 2.92
13 Fiber (g) White Rice 1.30
14 Fiber (g) Brown Rice 3.50
15 Folate Total (B9) (μg) White Rice 8.00
16 Folate Total (B9) (μg) Brown Rice 20.00
17 Iron (mg) White Rice 0.80
18 Iron (mg) Brown Rice 1.47
19 Lutein+zeaxanthin (μg) White Rice 0.00
20 Lutein+zeaxanthin (μg) Brown Rice NA
21 Magnesium (mg) White Rice 25.00
22 Magnesium (mg) Brown Rice 143.00
23 Manganese (mg) White Rice 1.09
24 Manganese (mg) Brown Rice 3.74
25 Monounsaturated fatty acids (g) White Rice 0.21
26 Monounsaturated fatty acids (g) Brown Rice 1.05
27 Niacin (B3) (mg) White Rice 1.60
28 Niacin (B3) (mg) Brown Rice 5.09
29 Pantothenic acid (B5) (mg) White Rice 1.01
30 Pantothenic acid (B5) (mg) Brown Rice 1.49
31 Phosphorus (mg) White Rice 115.00
32 Phosphorus (mg) Brown Rice 333.00
33 Polyunsaturated fatty acids (g) White Rice 0.18
34 Polyunsaturated fatty acids (g) Brown Rice 1.04
35 Potassium (mg) White Rice 115.00
36 Potassium (mg) Brown Rice 223.00
37 Protein (g) White Rice 7.10
38 Protein (g) Brown Rice 7.90
39 Riboflavin (B2)(mg) White Rice 0.05
40 Riboflavin (B2)(mg) Brown Rice 0.09
41 Saturated fatty acids (g) White Rice 0.18
42 Saturated fatty acids (g) Brown Rice 0.58
43 Selenium (μg) White Rice 15.10
44 Selenium (μg) Brown Rice NA
45 Sodium (mg) White Rice 5.00
46 Sodium (mg) Brown Rice 7.00
47 Sugar (g) White Rice 0.12
48 Sugar (g) Brown Rice 0.85
49 Thiamin (B1)(mg) White Rice 0.07
50 Thiamin (B1)(mg) Brown Rice 0.40
51 Vitamin A (IU) White Rice 0.00
52 Vitamin A (IU) Brown Rice 0.00
53 Vitamin B6 (mg) White Rice 0.16
54 Vitamin B6 (mg) Brown Rice 0.51
55 Vitamin C (mg) White Rice 0.00
56 Vitamin C (mg) Brown Rice 0.00
57 Vitamin E, alpha-tocopherol (mg) White Rice 0.11
58 Vitamin E, alpha-tocopherol (mg) Brown Rice 0.59
59 Vitamin K1 (μg) White Rice 0.10
60 Vitamin K1 (μg) Brown Rice 1.90
61 Water (g) White Rice 12.00
62 Water (g) Brown Rice 10.00
63 Zinc (mg) White Rice 1.09
64 Zinc (mg) Brown Rice 2.02
Brown Rice has four NA values.
Based on this graphic,
I think it would be fair to assume that the NA values for Brown Rice will be very close to the equivalent value for White Rice. And that it would be more accurate to mirror the white rice values rather than convert the values to zero.
My question is, besides manually looking up and inputting the white rice equivalent nutrient for brown rice, how would the code look to replace NA with the equivalent value for white rice? I would expect the result to convert the NA value for Copper; Brown Rice to be the same value as Copper; White Rice (which is 0.22). Would it be better to first replace NA with zero? But if I do that then I have six nutrients with a value of zero rather than four values with NA. I am trying to figure out the right mindset to solve this through code. Any insight into this would be greatly appreciated.
Thank you
Upvotes: 0
Views: 153
Reputation: 39154
Assuming that the data frame of your input data is called dt
, we can use the fill
function from the tidyr
package to achieve this task. dt2
is the final output.
library(tidyr)
dt2 <- dt %>% fill(nutrients)
dt2
nutrient.component. grain nutrients
1 1 Beta-carotene (µg) White Rice 0.00
2 2 Beta-carotene (µg) Brown Rice 0.00
3 3 Calcium (mg) White Rice 28.00
4 4 Calcium (mg) Brown Rice 23.00
5 5 Carbohydrates (g) White Rice 80.00
6 6 Carbohydrates (g) Brown Rice 77.00
7 7 Copper (mg) White Rice 0.22
8 8 Copper (mg) Brown Rice 0.22
...
The default of fill
will impute the NA
based on the previous and nearest non-NA row. So it is important to make sure that each brown rice record is exactly the next row of the associated white rice record.
Upvotes: 4
Reputation: 42544
The zoo
package has some useful functions to deal with NA
:
library(data.table)
setDT(DT)[, nutrients := zoo::na.aggregate(nutrients), by = nutrient.component][]
nutrient.component grain nutrients 1: Beta-carotene (<U+00B5>g) White Rice 0.00 2: Beta-carotene (<U+00B5>g) Brown Rice 0.00 3: Calcium (mg) White Rice 28.00 4: Calcium (mg) Brown Rice 23.00 5: Carbohydrates (g) White Rice 80.00 6: Carbohydrates (g) Brown Rice 77.00 7: Copper (mg) White Rice 0.22 8: Copper (mg) Brown Rice 0.22 9: Energy (kJ) White Rice 1528.00 10: Energy (kJ) Brown Rice 1549.00 11: Fat (g) White Rice 0.66 12: Fat (g) Brown Rice 2.92 13: Fiber (g) White Rice 1.30 14: Fiber (g) Brown Rice 3.50 15: Folate Total (B9) (<U+00B5>g) White Rice 8.00 16: Folate Total (B9) (<U+00B5>g) Brown Rice 20.00 17: Iron (mg) White Rice 0.80 18: Iron (mg) Brown Rice 1.47 19: Lutein+zeaxanthin (<U+00B5>g) White Rice 0.00 20: Lutein+zeaxanthin (<U+00B5>g) Brown Rice 0.00 ...
Note rows 2, 8, and 20.
data.table
is used here because it updates DT
in place which avoids copying the whole table to save memory and time.
Upvotes: 4
Reputation: 76450
I assume that your dataset is of class data.frame
and that it's named dat
.
I believe the following code will do it. It splits the df into lists of 2 or 1 rows (the last row in your example is missing Brown Rice). Then it checks if any of these lists have 2 rows and if the nutrients of Brown Rice are NA
. If so it assigns the value of White Rice. Then, collects the resulting list back in a data.frame
.
sp <- split(dat, dat$nutrient.component.)
res <- lapply(sp, function(x){
if(nrow(x) == 2 & is.na(x$nutrients[x$grain == "Brown Rice"]))
x$grain[x$grain == "Brown Rice"] <- "White Rice"
x
}
)
rm(sp) # tidy up
res <- do.call(rbind, res)
res
Upvotes: 3