Reputation: 697
Once again I'm a little stuck and reaching out for help. I hope one day being able to give this help back...
Anyways, I have a tibble that looks like this:
# A tibble: 20 x 6
# Groups: tipologia [6]
tipologia date_info pct_day pct_month pct_year pct_no_date
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aree soggette a crolli/ribaltamenti diffusi day 0.0508 NA NA NA
2 Aree soggette a crolli/ribaltamenti diffusi month NA 0.0217 NA NA
3 Aree soggette a crolli/ribaltamenti diffusi no date NA NA NA 0.227
4 Aree soggette a crolli/ribaltamenti diffusi year NA NA 0.701 NA
5 Aree soggette a frane superficiali diffuse day 0.0721 NA NA NA
6 Aree soggette a frane superficiali diffuse month NA 0.0218 NA NA
7 Aree soggette a frane superficiali diffuse no date NA NA NA 0.570
8 Aree soggette a frane superficiali diffuse year NA NA 0.336 NA
9 Aree soggette a sprofondamenti diffusi day 0.143 NA NA NA
10 Aree soggette a sprofondamenti diffusi no date NA NA NA 0.286
11 Aree soggette a sprofondamenti diffusi year NA NA 0.571 NA
12 Colamento lento day 0.119 NA NA NA
13 Colamento lento month NA 0.0475 NA NA
14 Colamento lento no date NA NA NA 0.122
15 Colamento lento year NA NA 0.712 NA
16 Colamento rapido day 0.478 NA NA NA
17 Colamento rapido month NA 0.00838 NA NA
18 Colamento rapido no date NA NA NA 0.0642
19 Colamento rapido year NA NA 0.450 NA
20 Complesso day 0.262 NA NA NA
There are four entries in "tipologia" as there are four possible date-informations (day, year, month or no info at all). What I'd like to have is only one row for each tipologia and essentially removing this unnecessary NAs. The NAs can't have any values, so they are just a little annoying.
I tried a lot with grouping again and summarizing, but did not achieve what I wanted to do. So any idea would be super helpful:)
Upvotes: 0
Views: 65
Reputation: 73342
You could use aggregate
and loop over the columns with lapply
, then merge
.
Reduce(function(...) merge(..., all=T), lapply(names(dat)[3:6], function(x)
aggregate(as.formula(paste(x, "~ tipologia")), dat, I)))
# tipologia pct_day pct_month pct_year pct_no_date
# 1 Aree soggette a crolli/ribaltamenti diffusi 0.0508 0.02170 0.701 0.2270
# 2 Aree soggette a frane superficiali diffuse 0.0721 0.02180 0.336 0.5700
# 3 Aree soggette a sprofondamenti diffusi 0.1430 NA 0.571 0.2860
# 4 Colamento lento 0.1190 0.04750 0.712 0.1220
# 5 Colamento rapido 0.4780 0.00838 0.450 0.0642
# 6 Complesso 0.2620 NA NA NA
Data:
dat <- structure(list(tipologia = c("Aree soggette a crolli/ribaltamenti diffusi",
"Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a crolli/ribaltamenti diffusi",
"Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a frane superficiali diffuse",
"Aree soggette a frane superficiali diffuse", "Aree soggette a frane superficiali diffuse",
"Aree soggette a frane superficiali diffuse", "Aree soggette a sprofondamenti diffusi",
"Aree soggette a sprofondamenti diffusi", "Aree soggette a sprofondamenti diffusi",
"Colamento lento", "Colamento lento", "Colamento lento", "Colamento lento",
"Colamento rapido", "Colamento rapido", "Colamento rapido", "Colamento rapido",
"Complesso"), date_info = c("day", "month", "no date", "year",
"day", "month", "no date", "year", "day", "no date", "year",
"day", "month", "no date", "year", "day", "month", "no date",
"year", "day"), pct_day = c(0.0508, NA, NA, NA, 0.0721, NA, NA,
NA, 0.143, NA, NA, 0.119, NA, NA, NA, 0.478, NA, NA, NA, 0.262
), pct_month = c(NA, 0.0217, NA, NA, NA, 0.0218, NA, NA, NA,
NA, NA, NA, 0.0475, NA, NA, NA, 0.00838, NA, NA, NA), pct_year = c(NA,
NA, NA, 0.701, NA, NA, NA, 0.336, NA, NA, 0.571, NA, NA, NA,
0.712, NA, NA, NA, 0.45, NA), pct_no_date = c(NA, NA, 0.227,
NA, NA, NA, 0.57, NA, NA, 0.286, NA, NA, NA, 0.122, NA, NA, NA,
0.0642, NA, NA)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20"))
Upvotes: 0
Reputation: 389155
You can use na.omit
to drop the NA
values.
library(dplyr)
df %>%
group_by(tipologia) %>%
summarise(across(starts_with('pct'), na.omit))
na.omit
should work for the above data but a little safer option would be :
df %>%
group_by(tipologia) %>%
summarise(across(starts_with('pct'), ~.x[!is.na(x)][1]))
Upvotes: 2