Reputation: 274
I'm rather stuck. I need to perform some calculations on selected years, which are different for different locations based on additional calculations. I have a data frame that looks like the following sample df:
t0 <- 2017
tf <- 2046
pH_drift <- 0.5
fert_lim <- (pH_drift/0.1)*11.21 # amount of fertilizer needed to decrease pH level by the target drift threshold (i.e. pH_drift)
state<-c(rep("IL",90),rep("IN",90))
county <- c(rep("a",30),rep("b",30),rep("c",30),rep("d",30),rep("e",30),rep("f",30))
for (i in length(county)) {year <- c(2017:2046)}
nfert <- c(sample(20,90,replace = TRUE),sample(13,90,replace = TRUE))
sand <- c(sample(100,90,replace = TRUE),sample(75,90,replace=TRUE))
This creates a data frame with the following format: "state", "county", "nitrogen fertilization", and "sand", from year 2017 to 2046
e.g.
state county year nfert sand
1 IL a 2017 11 99
2 IL a 2018 8 84
3 IL a 2019 7 43
4 IL a 2020 7 57
5 IL a 2021 20 15
I need to identify years to apply lime based on mean fertilizer application (nfert) per state. That is, each state will have an average fertilizer load, and using the following formula (fert_lim/n_load
) will give me the number of years between liming events. With this information, I need to identify the years from 2017 to 2046 when lime will be applied for each state and perform some additional calculations for those years. That is, the interval between liming events will be different for each state.
I tried the following code to create a column with 1 and 0 identifying the years of liming for each state:
lime_y <- NULL
st <- c(unique(state))
for (i in 1:length(st)) {
n_load <- mean(nfert)
lime_int <- round(fert_lim/n_load,digits = 0)
lime_yr <- c(seq(t0,tf,by=lime_int))
flag <- period %in% lime_yr
lime_yi <- ifelse(flag, 1,0)
lime_y <- c(lime_y,lime_yi)
}
df <- data.frame(state,county,year,nfert,sand,lime_y)
but it uses the average for all states (in this case, 7). So I tried:
lime_y <- NULL
st <- c(unique(state))
for (i in 1:length(st)) {
n_load <- mean(df$nfert[st])
lime_int <- round(fert_lim/n_load,digits = 0)
lime_yr <- c(seq(t0,tf,by=lime_int))
flag <- period %in% lime_yr
lime_yi <- ifelse(flag, 1,0)
lime_y <- c(lime_y,lime_yi)
}
df <- data.frame(state,county,year,nfert,sand,lime_y)
To differenciate between states, but I get the following error:
Error in seq.default(2017, 2046, by = lime_int) : invalid '(to - from)/by'.
The output should look like this:
state county year nfert sand lime_y
1 IL a 2017 11 99 1
2 IL a 2018 8 84 0
3 IL a 2019 7 43 0
4 IL a 2020 7 57 0
5 IL a 2021 20 15 0
6 IL a 2022 17 75 0
7 IL a 2023 13 21 0
8 IL a 2024 6 87 1
9 IL a 2025 5 56 0
10 IL a 2026 9 11 0
11 IL a 2027 19 50 0
12 IL a 2028 10 6 0
13 IL a 2029 3 99 0
14 IL a 2030 16 10 0
15 IL a 2031 7 4 1
16 IL a 2032 6 4 0
17 IL a 2033 10 52 0
18 IL a 2034 20 47 0
19 IL a 2035 17 27 0
20 IL a 2036 12 54 0
21 IL a 2037 16 81 0
22 IL a 2038 10 84 1
23 IL a 2039 15 68 0
24 IL a 2040 19 33 0
25 IL a 2041 14 80 0
26 IL a 2042 18 48 0
27 IL a 2043 10 58 0
28 IL a 2044 3 68 0
29 IL a 2045 10 62 1
30 IL a 2046 20 94 0
31 IL b 2017 13 92 1
32 IL b 2018 10 85 0
33 IL b 2019 17 33 0
34 IL b 2020 2 94 0
35 IL b 2021 4 9 0
36 IL b 2022 8 61 0
37 IL b 2023 3 16 0
38 IL b 2024 11 59 1
39 IL b 2025 12 71 0
40 IL b 2026 11 87 0
41 IL b 2027 2 83 0
42 IL b 2028 11 2 0
43 IL b 2029 3 11 0
44 IL b 2030 1 22 0
45 IL b 2031 5 51 1
46 IL b 2032 9 61 0
47 IL b 2033 20 63 0
48 IL b 2034 4 70 0
49 IL b 2035 1 92 0
50 IL b 2036 14 73 0
51 IL b 2037 10 51 0
52 IL b 2038 6 14 1
53 IL b 2039 1 58 0
54 IL b 2040 1 89 0
55 IL b 2041 18 30 0
56 IL b 2042 10 34 0
57 IL b 2043 9 69 0
58 IL b 2044 20 77 0
59 IL b 2045 7 16 1
60 IL b 2046 3 74 0
61 IL c 2017 4 12 1
62 IL c 2018 1 89 0
63 IL c 2019 14 44 0
64 IL c 2020 4 62 0
65 IL c 2021 5 62 0
66 IL c 2022 7 47 0
67 IL c 2023 18 25 0
68 IL c 2024 16 85 1
69 IL c 2025 10 26 0
70 IL c 2026 15 65 0
71 IL c 2027 3 33 0
72 IL c 2028 1 43 0
73 IL c 2029 3 43 0
74 IL c 2030 5 37 0
75 IL c 2031 4 5 1
76 IL c 2032 14 16 0
77 IL c 2033 7 100 0
78 IL c 2034 16 10 0
79 IL c 2035 15 84 0
80 IL c 2036 15 25 0
81 IL c 2037 15 48 0
82 IL c 2038 3 59 1
83 IL c 2039 20 73 0
84 IL c 2040 1 12 0
85 IL c 2041 2 19 0
86 IL c 2042 5 85 0
87 IL c 2043 9 27 0
88 IL c 2044 9 90 0
89 IL c 2045 1 49 1
90 IL c 2046 4 89 0
91 IN d 2017 8 19 1
92 IN d 2018 10 74 0
93 IN d 2019 10 19 0
94 IN d 2020 8 15 0
95 IN d 2021 7 4 0
96 IN d 2022 7 28 0
97 IN d 2023 12 34 0
98 IN d 2024 10 22 1
99 IN d 2025 6 37 0
100 IN d 2026 12 18 0
101 IN d 2027 13 27 0
102 IN d 2028 1 59 0
103 IN d 2029 8 27 0
104 IN d 2030 6 65 0
105 IN d 2031 13 43 1
106 IN d 2032 7 49 0
107 IN d 2033 12 45 0
108 IN d 2034 7 55 0
109 IN d 2035 7 6 0
110 IN d 2036 11 29 0
111 IN d 2037 12 66 0
112 IN d 2038 2 38 1
113 IN d 2039 2 27 0
114 IN d 2040 6 44 0
115 IN d 2041 5 50 0
116 IN d 2042 6 61 0
117 IN d 2043 8 33 0
118 IN d 2044 10 43 0
119 IN d 2045 5 31 1
120 IN d 2046 5 42 0
121 IN e 2017 13 30 1
122 IN e 2018 10 58 0
123 IN e 2019 12 75 0
124 IN e 2020 7 2 0
125 IN e 2021 4 29 0
126 IN e 2022 9 18 0
127 IN e 2023 13 24 0
128 IN e 2024 7 74 1
129 IN e 2025 7 70 0
130 IN e 2026 2 29 0
131 IN e 2027 5 64 0
132 IN e 2028 2 34 0
133 IN e 2029 3 48 0
134 IN e 2030 9 37 0
135 IN e 2031 8 67 1
136 IN e 2032 10 8 0
137 IN e 2033 4 56 0
138 IN e 2034 2 38 0
139 IN e 2035 2 3 0
140 IN e 2036 3 35 0
141 IN e 2037 8 67 0
142 IN e 2038 5 69 1
143 IN e 2039 2 44 0
144 IN e 2040 6 63 0
145 IN e 2041 7 51 0
146 IN e 2042 1 16 0
147 IN e 2043 7 60 0
148 IN e 2044 6 36 0
149 IN e 2045 12 17 1
150 IN e 2046 1 26 0
151 IN f 2017 4 1 1
152 IN f 2018 1 71 0
153 IN f 2019 4 55 0
154 IN f 2020 5 34 0
155 IN f 2021 4 69 0
156 IN f 2022 11 74 0
157 IN f 2023 1 13 0
158 IN f 2024 9 8 1
159 IN f 2025 8 18 0
160 IN f 2026 13 59 0
161 IN f 2027 2 14 0
162 IN f 2028 12 64 0
163 IN f 2029 1 71 0
164 IN f 2030 2 59 0
165 IN f 2031 12 6 1
166 IN f 2032 10 21 0
[ reached 'max' / getOption("max.print") -- omitted 14 rows ]
BUT flagged years (1 and 0 in the lime_y column) should be different for each state.
Any help or suggestion would be appreciated.Thanks!
Upvotes: 0
Views: 186
Reputation: 909
Looks like you have a few syntax issues in your two loops.
In your first loop, you are getting an average of all counties because you have nload<-mean(cty)
, but your indexing variable is i
. By the way, you have specified that cty<-unique(county)
, meaning that cty
is a character vector of the six counties. That will simply return
cty
[1] a, b, c, d, e, f
Also, the way you have the code now, each variable is a separate vector, so your indexing variable will select by element number (i.e., when i = 5, the loop will select the fifth element of the vector cty
). I think what you want here is to create a dataframe first, and then feed that to your loop
##Your example data
state<-c(rep("IL",90),rep("IN",90))
county <-c(rep("a",30),rep("b",30),rep("c",30),rep("d",30),rep("e",30),rep("f",30))
year <- rep(c(2017:2046), 6)
nfert <- c(sample(20,90,replace = TRUE),sample(13,90,replace = TRUE))
sand <- c(sample(100,90,replace = TRUE),sample(75,90,replace=TRUE))
DF<-data.frame(State=state, County=county, Nfert=nfert, Sand = sand, Period = year) #Using values from your first code block
fertlim<-180
cty<-unique(DF$County)
tmp<-list()
for(i in 1:length(cty)){
subs<-subset(DF, County==cty[i])
nload<-mean(subs$Nfert)
lime_int<-as.integer(fertlim/nload)
print(lime_int)
lime_yr <- seq.int(t0,tf,by=lime_int)
subs[, "Flag"]<-0
subs$Flag[subs$Period %in% lime_yr]<-1
tmp[[i]]<-subs
}
OUT<-do.call(rbind, tmp)
The output will look like this
> head(OUT)
State County Nfert Sand Period Flag
1 IL a 18 18 2017 1
2 IL a 18 4 2018 0
3 IL a 18 49 2019 0
4 IL a 1 82 2020 0
5 IL a 15 82 2021 0
6 IL a 9 19 2022 0
The second loop has a similar problem where again you are attempting to select with cty (which is a character vector), but your indexing variable is i. I think the solution I gave to the first loop should work for you.
Take Care,
-Sean
Upvotes: 1