Reputation: 209
Hi Have some data that looks like this:
ID Item Class Value Date
1 Eggs A 5 07/07/21
2 Eggs A 4.5 07/07/21
3 Cereal C 2 07/07/21
4 Eggs B 3.5 07/07/21
5 Bread B 2.5 07/07/21
6 Juice A 3 07/07/21
7 Juice C 1.5 07/07/21
8 Eggs C 2 07/07/21
9 Bread A 3 07/07/21
10 Juice A 3 07/07/21
This is just a sample, the actual data has around 8k rows (and there are 300+ dataframes). I want to create a new df that
Item
column (e.g. there are 10 items in the column, four of which are Eggs
, thus the percentage of Eggs
is 40)Value
of each Item
groupIdeally, the final data would look something like this:
Item Percentage Average_Value Date
Eggs 40 3.75 07/07/21
Cereal 10 2 07/07/21
Bread 20 2.75 07/07/21
Juice 30 2.5 07/07/21
A couple of things to note - I want to do this to 300+ different csv's (where my data is stored) and create a single df with this new information. Each csv refers to a different date, so the final df would look something like the above, only with another four lines, but with different values and a different date. I know that I'll need to use a for loop for that, and maybe I should figure that part out later, but thought it was worth mentioning now. Finally, at some point, I might want to also calculate the average value of Class
. Would I be better of making a separate df for this as I can't see how else I could do this?
Upvotes: 0
Views: 927
Reputation: 26218
Do these steps in tidyverse
temp
date
else you have to tweak the code a little bitmap
and imap_dfr
as explained below you can execute the same code for each file only once and as a result you'll have a single dataframesetwd('my/path/here')
temp <- list.files(pattern = '*.csv')
library(tidyverse)
map(temp, read.csv) %>% setNames(gsub('.csv', '', temp)) %>%
imap_dfr(~ .x %>% group_by(item) %>%
summarise(Percentage = n()/nrow(df)*100,
Average_Value = mean(Value), .groups = 'drop') %>%
mutate(Date = .y))
If instead all your csvs contain date column do this
map_dfr(temp, ~read.csv(.x) %>% group_by(item, date) %>%
summarise(Percentage = n()/nrow(df)*100,
Average_Value = mean(Value), .groups = 'drop'))
Upvotes: 1
Reputation: 388972
List all the files that you want to use using list.files
. Combine them in one file using map_df
and for each Date
calculate the percentage of Item
present and it's average value.
library(tidyverse)
filenames <- list.files(pattern = '\\.csv$')
map_df(filenames, read_csv) %>%
group_by(Date, Item) %>%
summarise(Percentage = n(),
Average_Value = mean(Value)) %>%
mutate(Percentage = prop.table(Percentage) * 100) %>%
ungroup -> result
result
Upvotes: 0
Reputation: 39657
You can merge
the result of proportions
of table
with the result form aggregate
of the mean
of Value ~ Item
. In case the Date
also need to be there, this could be added using cbind
or data.frame
.
merge(aggregate(cbind(Average_Value = Value) ~ Item, x, mean)
, proportions(table(x$Item))*100
, by.y=1, by.x="Item")[c(1,3,2)]
# Item Freq Average_Value
#1 Bread 20 2.75
#2 Cereal 10 2.00
#3 Eggs 40 3.75
#4 Juice 30 2.50
Or using only aggregate
:
aggregate(Value ~ Item, x, function(y) c(Freq=length(y)/nrow(x)*100, Average=mean(y)))
# Item Value.Freq Value.Average
#1 Bread 20.00 2.75
#2 Cereal 10.00 2.00
#3 Eggs 40.00 3.75
#4 Juice 30.00 2.50
Upvotes: 1
Reputation: 11584
Does this work:
library(dplyr)
df %>% group_by(Item) %>% summarise(Percentage = n()/nrow(df)*100, Average_Value = mean(Value))
# A tibble: 4 x 3
Item Percentage Average_Value
<chr> <dbl> <dbl>
1 Bread 20 2.75
2 Cereal 10 2
3 Eggs 40 3.75
4 Juice 30 2.5
Upvotes: 1