Reputation: 65
I have a csv dataset which is similar to the following:
Age Food_1_1 Food_1_2 Food_1_3 Amount_1_1 Amount_1_2 Amount_1_3
6 a b a 2 3 4
6 b b c 1 2 3
6 c a 4 1
9 c c b 1 3 1
9 c a b 1 2 1
Using R, I want to get the following data set which contains new set of columns a, b and c by adding the corresponding values:
Age Food_1_1 Food_1_2 Food_1_3 Amount_1_1 Amount_1_2 Amount_1_3 a b c
6 a b a 2 3 4 6 3 0
6 b b c 1 2 3 0 3 3
6 c a 4 1 1 0 4
9 c c b 1 3 1 0 1 4
9 c a b 1 2 1 2 1 1
Note: My data also contains missing values. The variables Food_1_1:Food_1_3 are factors and the variables Value1:Value3 are numeric. For more clearity: 1st row of column "a" contains the addition of all amount through Amount_1_1 to Amount_1_3 related to a (say for example 2+4 =6).
Upvotes: 1
Views: 139
Reputation: 4358
Using Darren Tsai's Example Data and in Base-R
we can get the values with
sapply(c("a","b","c"), function(x) apply(df,1, function(z) sum(as.numeric(z[grep(x,z)+3]))))
a b c
[1,] 6 3 0
[2,] 0 3 3
[3,] 1 0 4
[4,] 0 1 4
[5,] 2 1 1
If you want to append them to your data frame then we can
cbind(df, The-above-code-goes-here )
Age Food_1_1 Food_1_2 Food_1_3 Amount_1_1 Amount_1_2 Amount_1_3 a b c
1 6 a b a 2 3 4 6 3 0
2 6 b b c 1 2 3 0 3 3
3 6 <NA> c a NA 4 1 1 0 4
4 9 c c b 1 3 1 0 1 4
5 9 c a b 1 2 1 2 1 1
Edit: A breakdown of the code.
sapply(c("a","b","c"), function(x) ... )
This sapply
function takes the arguments from the list provided a
, b
, & c
, and runs the function in ...
one at a time but renames the arguments to x
so that they can be referenced.
The next layer is apply(df,1, function(z) ... )
This apply
function takes the argument df
- your data frame, and runs the function inside of it ...
row-by-row (specified with 1
).
grep(x,z)
is now finding which column the letter's x = a,b,c
are found in that specific row of the dataframe df = z
. Now that we've located the correct column(s) that contain the letters, the actual numbers are we want to sum are three columns to the the right, so we add grep(x,z)+3
to access those numbers.
lastly sum(as.numeric(z[grep(x,z)+3]))
We are summing z[ 'columns' ]
where if you recal z
is the row of your dataframe in every iteration of the apply 'loop'.
Upvotes: 1
Reputation: 35604
Stack Food
and Amount
to 2 columns and count observations by group.
library(tidyverse)
df %>%
rowid_to_column("id") %>%
pivot_longer(-(1:2), names_to = c(".value", "set"), names_sep = "_1_", values_drop_na = T) %>%
group_by(id, Food) %>%
tally(Amount) %>%
ungroup() %>%
pivot_wider(names_from = Food, values_from = n, values_fill = list(n = 0))
More concisely, use the argument values_fn
:
df %>%
rowid_to_column("id") %>%
pivot_longer(-(1:2), names_to = c(".value", "set"), names_sep = "_1_", values_drop_na = T) %>%
pivot_wider(id_cols = id, names_from = Food, values_from = Amount,
values_fill = list(Amount = 0), values_fn = list(Amount = sum))
# # A tibble: 5 x 4
# id a b c
# <int> <int> <int> <int>
# 1 1 6 3 0
# 2 2 0 3 3
# 3 3 1 0 4
# 4 4 0 1 4
# 5 5 2 1 1
Data
df <- structure(list(Age = c(6L, 6L, 6L, 9L, 9L), Food_1_1 = structure(c(1L,
2L, NA, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"),
Food_1_2 = structure(c(2L, 2L, 3L, 3L, 1L), .Label = c("a",
"b", "c"), class = "factor"), Food_1_3 = structure(c(1L,
3L, 1L, 2L, 2L), .Label = c("a", "b", "c"), class = "factor"),
Amount_1_1 = c(2L, 1L, NA, 1L, 1L), Amount_1_2 = c(3L, 2L,
4L, 3L, 2L), Amount_1_3 = c(4L, 3L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 2
Reputation: 939
A possible solution would be the following: First, consider I put your data in a txt file formatted as follows:
Age,Food_1_1,Food_1_2,Food_1_3,Amount_1_1,Amount_1_2,Amount_1_3
6,a,b,a,2,3,4
6,b,b,c,1,2,3
6,,c,a,,4,1
9,c,c,b,1,3,1
9,c,a,b,1,2,1
The code to read the file and compute the sums is the following:
data=read.csv("test_table.txt", stringsAsFactors = FALSE)
lwords=c("a","b","c")
for(aw in lwords){
data[,ncol(data)+1]=apply(data, 1, function(x, aw){
return(sum(as.numeric(x[5:7][aw==x[2:4]])))
}, aw=aw)
colnames(data)[ncol(data)]=aw
}
The idea is very simple. In the for-loop you check if a word (imagine "a") appears in columns 2 to 4. This generates a list of true/false. You then use this list of true/false to sum the content of the columns 5 to 7.
The result:
> data
Age Food_1_1 Food_1_2 Food_1_3 Amount_1_1 Amount_1_2 Amount_1_3 a b c
1 6 a b a 2 3 4 6 3 0
2 6 b b c 1 2 3 0 3 3
3 6 c a NA 4 1 1 0 4
4 9 c c b 1 3 1 0 1 4
5 9 c a b 1 2 1 2 1 1
Upvotes: 0