ynitSed
ynitSed

Reputation: 145

data table create row

I've looked extensively on stack overflow but wasn't able to find anything useful for my desired output.

To illustrate, consider the following example data frame:

      D     X     Y     Z     A     B     C    Total
 1   abc    2     3     4     7     2     1      19

The total corresponds to the sum of each row. For simplicity, let B=19, which is the total. The output I desire is:

    D     X     Y     Z     A     B     C    Total
 1  abc   1     2     3     4     5     2      B
 2  N/A   1/B   2/B   3/B   4/B   5/B   2/B    1

Here, each element in the 1st row is divided by the total number, and this is reflected in the 2nd row. To create a column for total, I used mutate and did this:

df <- df %>% mutate(Total = X + Y + Z + A + B + C)

But I wasn't able to figure out how to create a row where each element is divisible by the Total number.

Any help would be appreciated! I wouldn't mind using mutate or data.table in doing this, as I used data.table to create a big dataframe.

EDIT1: I'm really sorry for not mentioning this, but a column includes some strings. I have editted the above to reflect this.

Upvotes: 1

Views: 78

Answers (2)

jmuhlenkamp
jmuhlenkamp

Reputation: 2160

Here's a dplyr answer to your question. What you actually want to do might be more complex, but this simple bind_rows, filter, and mutate_all works for the simple provided example.

library(dplyr)
df <- data.frame(x = 2:3, y = 3:4, z = letters[1:2], total = c(0, 19))
bind_rows(
    df,
    filter(df, row_number() == n()) %>%
        mutate_if(is.numeric, funs(. / total))
)

# x         y z total
# 1 2.0000000 3.0000000 a     0
# 2 3.0000000 4.0000000 b    19
# 3 0.1578947 0.2105263 b     1

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389335

I have added one more row to make the solution more general.

In base R, we can divide the dataframe by the Total column in that row and then rbind it with original dataframe.

new_df <- rbind(df, df/df[, "Total"])
new_df

#           X         Y         Z         A         B          C Total
#1  2.0000000 3.0000000 4.0000000 7.0000000 2.0000000 1.00000000    19
#2  1.0000000 2.0000000 5.0000000 6.0000000 7.0000000 4.00000000    25
#11 0.1052632 0.1578947 0.2105263 0.3684211 0.1052632 0.05263158     1
#21 0.0400000 0.0800000 0.2000000 0.2400000 0.2800000 0.16000000     1

If the order is important and you want to maintain it, then we can just reorder it

rbind(new_df[c(T, F),], new_df[c(F, T),])

#           X         Y         Z         A         B          C Total
#1  2.0000000 3.0000000 4.0000000 7.0000000 2.0000000 1.00000000    19
#11 0.1052632 0.1578947 0.2105263 0.3684211 0.1052632 0.05263158     1
#2  1.0000000 2.0000000 5.0000000 6.0000000 7.0000000 4.00000000    25
#21 0.0400000 0.0800000 0.2000000 0.2400000 0.2800000 0.16000000     1

EDIT

If there are certain columns which are string we can ignore them and use bind_rows instead of rbind as it directly returns NA for non-matching columns.

library(dplyr)
bind_rows(df1, df1[!names(df1) %in% "D"]/df1[, "Total"])

#         X         Y         Z         A         B          C  Total    D
#1 2.0000000 3.0000000 4.0000000 7.0000000 2.0000000 1.00000000    19  abc
#2 1.0000000 2.0000000 5.0000000 6.0000000 7.0000000 4.00000000    25  def
#3 0.1052632 0.1578947 0.2105263 0.3684211 0.1052632 0.05263158     1 <NA>
#4 0.0400000 0.0800000 0.2000000 0.2400000 0.2800000 0.16000000     1 <NA>

data

df <- structure(list(X = c(2, 1), Y = c(3, 2), Z = c(4, 5), A = c(7, 
  6), B = c(2, 7), C = c(1, 4), Total = c(19, 25)), .Names = c("X", 
  "Y", "Z", "A", "B", "C", "Total"), row.names = c("1", "2"), class = "data.frame")

df1 <-structure(list(X = c(2, 1), Y = c(3, 2), Z = c(4, 5), A = c(7, 
6), B = c(2, 7), C = c(1, 4), Total = c(19, 25), D = c("abc", 
"def")), .Names = c("X", "Y", "Z", "A", "B", "C", "Total", "D"
 ), row.names = c("1", "2"), class = "data.frame")

Upvotes: 1

Related Questions