Reputation: 223
I have a data frame as follows:
days target probability
1 75 0.80 0.9060341
2 100 0.90 0.75
df <- structure(list(days = c(75, 100, 120, 150, 200, 300, 75, 100,
120, 150, 200, 300, 75, 100, 120, 150, 200, 300, 75, 100, 120,
150, 200, 300, 75, 100, 120, 150, 200, 300, 75, 100, 120, 150,
200, 300), target = c(0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.9, 0.9,
0.9, 0.9, 0.9, 0.9, 1, 1, 1, 1, 1, 1, 1.05, 1.05, 1.05, 1.05,
1.05, 1.05, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.2, 1.2, 1.2, 1.2,
1.2, 1.2), probability = c(0.90603410539241, 0.90603410539241,
0.90603410539241, 0.90603410539241, 0.90603410539241, 0.904213051602258,
0.733995206180212, 0.733995206180212, 0.733995206180212, 0.733995206180212,
0.733995206180212, 0.731795453278156, 0.512082243536284, 0.512082243536284,
0.512082243536284, 0.512082243536284, 0.512082243536284, 0.511492313399902,
0.390943562448882, 0.390943562448882, 0.390943562448882, 0.390943562448882,
0.390943562448882, 0.391451116324459, 0.282452594645645, 0.282452594645645,
0.282452594645645, 0.282452594645645, 0.282452594645645, 0.283766337160544,
0.106271449405461, 0.106271449405461, 0.106271449405461, 0.106271449405461,
0.106271449405461, 0.107778317673786)), .Names = c("days", "target",
"probability"), class = "data.frame", row.names = c(1L, 2L, 3L,
4L, 5L, 7L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 20L,
21L, 23L, 25L, 26L, 27L, 28L, 29L, 31L, 33L, 34L, 35L, 36L, 37L,
40L, 43L, 44L, 45L, 46L, 47L, 49L))
And would like to have a single row emmited in a CSV file with thefollowing headers:
day75_target0.80, day100_target0.9, and so forth -- the values in each row should just be the corresponding probability.
Thoughts?
Upvotes: 3
Views: 229
Reputation: 1731
That's not the most attractive thing to do to your poor data, but taking it at face value. This is easy with the tidyverse.
library(tidyverse)
#first create the columns:
> df %>% unite(daytarg, days, target, sep = "_target") %>% head
daytarg probability
1 75_target0.8 0.9060341
2 100_target0.8 0.9060341
3 120_target0.8 0.9060341
4 150_target0.8 0.9060341
5 200_target0.8 0.9060341
7 300_target0.8 0.9042131
seems sensible to check we will have unique columns
> df %>% unite(daytarg, days, target, sep = "_target") %>% count(daytarg) %>% filter(n > 1)
# A tibble: 0 x 2
# ... with 2 variables: daytarg <chr>, n <int>
Okay, good. Now we can add a spread:
> df %>%
unite(daytarg, days, target, sep = "_target") %>%
spread(daytarg, probability) %>%
write_csv("output.csv")
So all this is simply a "create desired name from desired columns" and turn that name into columns using probability as the value. But beware with anything like this that you have unique combinations.
Upvotes: 0
Reputation: 107767
Consider this base R approach by simply concatenating fields and then transposing dataframe:
# CONCATENATING DAYS AND TARGETS FIELDS
newdf <- data.frame(daystarget = paste0("day", df$days, "_target", df$target,
probability = df$probability), stringsAsFactors=F)
# ROUND PROBABILITY TO ONE DIGIT
newdf$probability <- round(as.numeric(newdf$probability), 1)
# TRANSPOSE DATA FRAME
finaldf <- data.frame(t(newdf),stringsAsFactors=F)
# RENAME COLUMNS TO FIRST ROW
names(finaldf) <- finaldf[1,]
# REMOVE PREVIOUS FIRST ROW
finaldf <- finaldf[2,]
# RESET ROW NAMES
row.names(finaldf) <- 1:nrow(finaldf)
write.csv(finaldf, "FinalDF.csv", row.names=F)
# day75_target0.8 day100_target0.8 day120_target0.8 day150_target0.8 ...
#1 0.9 0.9 0.9 0.9 ...
Upvotes: 1