Reputation: 33
The data set consists of sequence of number like shown below. I need a new column that tells the average movement between the first value and final value. For example, in first row, the numbers are ( 1, 3, 5, 2), so the average movement is average(abs(3-1), abs(5-1), abs(2-1)) that is 2.333
Sl no Column A req
1 1 > 3 > 5 > 2 2.333
2 2 > 5 > 1 > 10 > 5 3.75
Is there a way to solve this?
Upvotes: 0
Views: 67
Reputation: 269664
1) dplyr/tidyr Assuming the input DF
shown reproducibly in the Note at the end, separate Column A
into separate rows, summarize it and join the new req column onto the original data frame.
library(dplyr)
library(tidyr)
Req <- DF %>%
separate_rows(`Column A`, convert = TRUE) %>%
group_by(`Sl no`) %>%
summarize(req = mean(abs(`Column A`[-1] - `Column A`[1]))) %>%
ungroup
DF %>% inner_join(Req)
## Joining, by = "Sl no"
## Sl no Column A req
## 1 1 1 > 3 > 5 > 2 2.333333
## 2 2 2 > 5 > 1 > 10 > 5 3.750000
2) Base R A base R solution used AvgAbsDiff
which takes a character string of the form in Column A
, reads it in using scan
and computes req
. It is then applied to each row.
AvgAbsDiff <- function(x) {
z <- scan(text = x, sep = ">", quiet = TRUE)
mean(abs(z[-1] - z[1]))
}
transform(DF, req = sapply(as.character(`Column A`), AvgAbsDiff),
row.names = NULL, check.names = FALSE)
## Sl no Column A req
## 1 1 1 > 3 > 5 > 2 2.333333
## 2 2 2 > 5 > 1 > 10 > 5 3.750000
2a) This base solution does not use any *apply functions. Read in Column A
into a data frame using read.table
and then compute req
from that.
r <- read.table(text = as.character(DF$`Column A`), header = FALSE,
sep = ">", fill = NA)
transform(DF, req = rowMeans(abs(r[, -1] - r[, 1]), na.rm = TRUE),
check.names = FALSE)
## Sl no Column A req
## 1 1 1 > 3 > 5 > 2 2.333333
## 2 2 2 > 5 > 1 > 10 > 5 3.750000
DF <-
structure(list(`Sl no` = 1:2, `Column A` = structure(1:2,
.Label = c("1 > 3 > 5 > 2",
"2 > 5 > 1 > 10 > 5"), class = "factor")),
class = "data.frame", row.names = c(NA, -2L))
Upvotes: 1
Reputation: 86
The following code works, but it could be written in a nicer way. Anyhow ..
Let me first re-create your dataframe
q <- data.frame(Sl_no=1:2, Column_A=c(("1 > 3 > 5 > 2"), (" 2 > 5 > 1 > 10 > 5")))
q$req <- NA
Which gives us:
Sl_no Column_A req
1 1 > 3 > 5 > 2 NA
2 2 > 5 > 1 > 10 > 5 NA
Next, let's split this sequence into a R list:
q$A <- strsplit(as.character(q$Column_A), " > ")
And also add the length of this list:
q$A_length <- lengths(q$A)
Then, let's create a loop that will go through element 2: the last element in the sequence, subtract the first number in the sequence from it and store in a list. At the end of the inner loop we store the mean of this values in the corresponding row in req
column.
for (row in 1:nrow(q)) {
sum = c()
for (number in 2:q$A_length[row]) {
sum <- c(sum, as.numeric(q$A[[row]][number]) - as.numeric(q$A[[row]][1]))
}
q$req[row] <- mean(sum)
}
Finally, let's print the created dataframe:
Sl_no Column_A A A_length req
1 1 > 3 > 5 > 2 1, 3, 5, 2 4 2.333333
2 2 > 5 > 1 > 10 > 5 2, 5, 1, 10, 5 5 3.250000
Hope this helps!
Upvotes: 0
Reputation: 2031
You can do something like this:
data$req <- sapply(data$columnA,
function(x) mean(abs(diff(as.integer(unlist(strsplit(x, ' > ')))))))
sapply
will apply the function to each element of columnA
.
Then, working from the inside outwards:
strsplit
splits the element in columnA on " > " resulting in:
[[1]]
[1] "1" "3" "5" "2"
unlist
will turn the result in a vector:
"1" "3" "5" "2"
as.integer
to convert it to numbers:
1 3 5 2
diff
to compute the differences between the numbers:
2 2 -3
abs
to take the absolute values:
2 2 3
and finally mean
to take the average:
2.333333
Edit: Misread the question slightly, this will calculate the result correctly:
my.df$req <- sapply(my.df$columnA,
function(x) {
tmp <- as.integer(unlist(strsplit(x, ' > ')))
mean(abs(tmp[2:length(tmp)] - tmp[1]))
})
Upvotes: 0