Reputation: 145
I've looked extensively on stack overflow but wasn't able to find anything useful for my desired output.
To illustrate, consider the following example data frame:
D X Y Z A B C Total
1 abc 2 3 4 7 2 1 19
The total corresponds to the sum of each row. For simplicity, let B=19, which is the total. The output I desire is:
D X Y Z A B C Total
1 abc 1 2 3 4 5 2 B
2 N/A 1/B 2/B 3/B 4/B 5/B 2/B 1
Here, each element in the 1st row is divided by the total number, and this is reflected in the 2nd row. To create a column for total, I used mutate and did this:
df <- df %>% mutate(Total = X + Y + Z + A + B + C)
But I wasn't able to figure out how to create a row where each element is divisible by the Total number.
Any help would be appreciated! I wouldn't mind using mutate or data.table in doing this, as I used data.table to create a big dataframe.
EDIT1: I'm really sorry for not mentioning this, but a column includes some strings. I have editted the above to reflect this.
Upvotes: 1
Views: 78
Reputation: 2160
Here's a dplyr
answer to your question. What you actually want to do might be more complex, but this simple bind_rows
, filter
, and mutate_all
works for the simple provided example.
library(dplyr)
df <- data.frame(x = 2:3, y = 3:4, z = letters[1:2], total = c(0, 19))
bind_rows(
df,
filter(df, row_number() == n()) %>%
mutate_if(is.numeric, funs(. / total))
)
# x y z total
# 1 2.0000000 3.0000000 a 0
# 2 3.0000000 4.0000000 b 19
# 3 0.1578947 0.2105263 b 1
Upvotes: 1
Reputation: 389335
I have added one more row to make the solution more general.
In base R, we can divide the dataframe by the Total
column in that row and then rbind
it with original dataframe.
new_df <- rbind(df, df/df[, "Total"])
new_df
# X Y Z A B C Total
#1 2.0000000 3.0000000 4.0000000 7.0000000 2.0000000 1.00000000 19
#2 1.0000000 2.0000000 5.0000000 6.0000000 7.0000000 4.00000000 25
#11 0.1052632 0.1578947 0.2105263 0.3684211 0.1052632 0.05263158 1
#21 0.0400000 0.0800000 0.2000000 0.2400000 0.2800000 0.16000000 1
If the order is important and you want to maintain it, then we can just reorder it
rbind(new_df[c(T, F),], new_df[c(F, T),])
# X Y Z A B C Total
#1 2.0000000 3.0000000 4.0000000 7.0000000 2.0000000 1.00000000 19
#11 0.1052632 0.1578947 0.2105263 0.3684211 0.1052632 0.05263158 1
#2 1.0000000 2.0000000 5.0000000 6.0000000 7.0000000 4.00000000 25
#21 0.0400000 0.0800000 0.2000000 0.2400000 0.2800000 0.16000000 1
EDIT
If there are certain columns which are string we can ignore them and use bind_rows
instead of rbind
as it directly returns NA
for non-matching columns.
library(dplyr)
bind_rows(df1, df1[!names(df1) %in% "D"]/df1[, "Total"])
# X Y Z A B C Total D
#1 2.0000000 3.0000000 4.0000000 7.0000000 2.0000000 1.00000000 19 abc
#2 1.0000000 2.0000000 5.0000000 6.0000000 7.0000000 4.00000000 25 def
#3 0.1052632 0.1578947 0.2105263 0.3684211 0.1052632 0.05263158 1 <NA>
#4 0.0400000 0.0800000 0.2000000 0.2400000 0.2800000 0.16000000 1 <NA>
data
df <- structure(list(X = c(2, 1), Y = c(3, 2), Z = c(4, 5), A = c(7,
6), B = c(2, 7), C = c(1, 4), Total = c(19, 25)), .Names = c("X",
"Y", "Z", "A", "B", "C", "Total"), row.names = c("1", "2"), class = "data.frame")
df1 <-structure(list(X = c(2, 1), Y = c(3, 2), Z = c(4, 5), A = c(7,
6), B = c(2, 7), C = c(1, 4), Total = c(19, 25), D = c("abc",
"def")), .Names = c("X", "Y", "Z", "A", "B", "C", "Total", "D"
), row.names = c("1", "2"), class = "data.frame")
Upvotes: 1