Reputation: 135
I am a beginner in R, but until this moment I was able to find all answers related to R coding. This time I even don't know where to start, so I would be grateful for your help. I have a .csv file, the fragment of my data is below:
`year ` `1k` `2k` `3k` `4k` `5k`
1 `1981-01-01 00:00:00 NA NA NA NA NA`
2 `1981-01-02 00:00:00 NA NA NA NA NA`
3 `1981-01-03 00:00:00 NA NA NA NA NA`
4 `1981-01-04 00:00:00 NA NA COLD COLD COLD`
5 `1981-01-05 00:00:00 NA NA NA COLD NA`
6 `1981-01-06 00:00:00 COLD NA NA COLD NA`
7 `1981-01-07 00:00:00 COLD NA NA COLD NA`
8 `1981-01-08 00:00:00 COLD NA NA COLD COLD`
9 `1981-01-09 00:00:00 COLD NA NA NA NA`
10 `1981-01-10 00:00:00 NA NA NA NA NA`
11 `1981-01-11 00:00:00 NA COLD NA NA NA`
12 `1981-01-12 00:00:00 NA COLD NA COLD COLD`
13 `1981-01-13 00:00:00 NA NA NA NA NA`
14 `1981-01-14 00:00:00 COLD NA NA NA NA`
15 `1981-01-15 00:00:00 NA NA NA NA NA`
16 `1981-01-16 00:00:00 COLD NA NA NA NA`
17 `1981-01-17 00:00:00 NA NA NA NA NA`
18 `1981-01-18 00:00:00 NA NA NA COLD NA`
19 `1981-01-19 00:00:00 NA NA NA COLD NA`
20 `1981-01-20 00:00:00 NA NA NA COLD NA`
I know how to count the number of COLD
events, but I need a specific condition -> to count all events with 3 or more COLD
records in a row and then to sum all those events inside each column k1, k2, k3...etc. separately. For the given example of data, it will be: k1=1, k2=0, k3=0, k4=2, k5=0. I was thinking to use repeat loops, but really don't know where to start.
Upvotes: 2
Views: 192
Reputation: 76
I think the best option is to use loops as you said. I rewrote your example below, so it's reproducible.
col_year <- c("1981-01-01 00:00:00","1981-01-02 00:00:00","1981-01-03 00:00:00","1981-01-04 00:00:00",
"1981-01-05 00:00:00","1981-01-06 00:00:00","1981-01-07 00:00:00","1981-01-08 00:00:00",
"1981-01-09 00:00:00","1981-01-10 00:00:00","1981-01-11 00:00:00","1981-01-12 00:00:00",
"1981-01-13 00:00:00","1981-01-14 00:00:00","1981-01-15 00:00:00","1981-01-16 00:00:00",
"1981-01-17 00:00:00","1981-01-18 00:00:00","1981-01-19 00:00:00","1981-01-20 00:00:00")
k1 <- c(rep(NA,5),rep("COLD",4),rep(NA,4),"COLD",NA,"COLD",rep(NA,4))
k2 <- c(rep(NA,10),rep("COLD",2),rep(NA,8))
k3 <- c(rep(NA,3),"COLD",rep(NA,16))
k4 <- c(rep(NA,3),rep("COLD",5),rep(NA,3),"COLD",rep(NA,5),rep("COLD",3))
k5 <- c(rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,8))
df <- data.frame(year=col_year,k1,k2,k3,k4,k5)
I will subset the dataframe to be easier to use indexes, as made by @denisafonin (answer excluded). Then I will initialize two counters, one for counts of "COLD"
events and one for counts of groups of three or more "COLD"
events. Then I'll use a for loop to count this things and another for loop to make it for each column. Here is the code:
# subset columns
n_total_columns <- ncol(df)
df2 <- df[2:n_total_columns]
# initializing variables
n_rows <- nrow(df2)
n_cols <- ncol(df2)
total_counts <- rep(0,n_cols)
names(total_counts) <- names(df2)
# for each k#
for (j in 1:n_cols){
cold_counts <- 0
for (i in 1:n_rows){
# case for NA's
if(is.na(df2[i,j])){
# reset to zero
cold_counts <- 0
}else{
if(df2[i,j] == "COLD"){
#sum 1
cold_counts <- cold_counts + 1
}else{
# reset to zero
cold_counts <- 0
}
}
# counts equal 3
if(cold_counts == 3)
# sum 1 to the total
total_counts[j] <- total_counts[j] + 1
}
}
# your result
print(total_counts)
# or in a data.frame
df_final <- data.frame(k_j = names(total_counts),total_counts)
print(df_final)
If there was no NA's in your data, you could remove if(is.na(df2[i,j]))
and leave just what is inside the else
.
Upvotes: 1
Reputation: 1136
Using the example by @rodrigo-lustosa:
col_year <- c("1981-01-01 00:00:00","1981-01-02 00:00:00","1981-01-03 00:00:00","1981-01-04 00:00:00",
"1981-01-05 00:00:00","1981-01-06 00:00:00","1981-01-07 00:00:00","1981-01-08 00:00:00",
"1981-01-09 00:00:00","1981-01-10 00:00:00","1981-01-11 00:00:00","1981-01-12 00:00:00",
"1981-01-13 00:00:00","1981-01-14 00:00:00","1981-01-15 00:00:00","1981-01-16 00:00:00",
"1981-01-17 00:00:00","1981-01-18 00:00:00","1981-01-19 00:00:00","1981-01-20 00:00:00")
k1 <- c(rep(NA,5),rep("COLD",4),rep(NA,4),"COLD",NA,"COLD",rep(NA,4))
k2 <- c(rep(NA,10),rep("COLD",2),rep(NA,8))
k3 <- c(rep(NA,3),"COLD",rep(NA,16))
k4 <- c(rep(NA,3),rep("COLD",5),rep(NA,3),"COLD",rep(NA,5),rep("COLD",3))
k5 <- c(rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,8))
df <- data.frame(year=col_year,k1,k2,k3,k4,k5)
This for loop does the job:
for (col in colnames(df[2:6])) {
i = 1
total = 0
vec <- c(df[[col]])
while (i < length(vec)) {
if (vec[i] != "COLD" & !is.na(vec[i])) {
i <- i + 1
} else {
sum = 0
COLD = TRUE
while (COLD==TRUE) {
i = i+1
if (vec[i]=="COLD" & !is.na(vec[i])) {
sum = sum+1
} else {
COLD <- FALSE
}
}
if (sum >=3) {
total = total + 1
}
}
}
print(paste0(col, ": ", total))
}
Output:
[1] "k1: 1"
[1] "k2: 0"
[1] "k3: 0"
[1] "k4: 2"
[1] "k5: 0"
Upvotes: 1
Reputation: 1160
Use "run length encoding" (rle) across the columns. Here's a function to do it (assuming that your data frame is df
:
rle_col = function(k_col, num = 3){
k_col[is.na(k_col)] = "NA" # convert NAs
r = rle(k_col) # run length encoding
which_cold = r$values == "COLD"
sum(r$lengths[which_cold] >= num)
}
sapply(df[-1], rle_col)
Upvotes: 2
Reputation: 1763
We can use the rle
function, assuming df
is your dataframe :
apply(df[,2:6],2,function(x){sum(with(rle(x),lengths)[which(with(rle(x),values=="COLD"))]>=3)})
Upvotes: 1