user10675575
user10675575

Reputation:

Create a matrix out of a dataframe

I am very new to R and script writing in general. Please be patient if this is a very basic question. My search for a solution has not been successful.

date.depature <- c("2016.06.16", "2016.11.16", "2017.01.05", "2017.01.12", "2017.02.25")
airport.departure <- c("CDG", "QNY", "QXO", "CDG", "QNY")
airport.arrival <- c("SYD", "CDG", "QNY", "SYD", "QXO")
amount <- c("1", "3", "1", "10", "5")
df <- data.frame(date.depature, airport.departure, airport.arrival, amount)

I want to change the df to a matrix which has the airport.departure as rows and the airport.arrival as columns with the cumulated amount for a given month/years in the cells of the matrix.

Upvotes: 0

Views: 95

Answers (2)

R-Peys
R-Peys

Reputation: 163

using dplyr package:

library(dplyr)
df %>% mutate(month.departure =substr(date.depature, 1, 7)) %>%
  group_by(airport.departure, airport.arrival, month.departure) %>%
  summarize(total = sum(as.numeric(as.character(amount))))
  • mutate adds a column that gives the month/year (without the date, because you asked to group by month)
  • group_by says you want to see statistics grouped according to these variables
  • summarize computes the statistic you're looking for, in this case the total amount per 'group.' The as.numeric(as.character()) is included because 'amount' started as a factor type, which can't be summed [the as.character() is needed because as.numeric would have unexpected behavior if called directly on the factor]. total = gives an arbitrary column name to include in the output table; you could leave it out and this will still work.
  • the %>% is used to pipe output from one command as input to the next command in dplyr

Output:

output table

(format of output table is different than you described, but it contains all the info you're looking for. and dplyr is a great package to start learning for this sort of data manipulation!)

Upvotes: 0

Sotos
Sotos

Reputation: 51592

You are looking for xtabs, i.e.

xtabs(amount ~ airport.arrival + airport.departure, df)

which gives,

              airport.departure
airport.arrival CDG QNY QXO
            CDG   0   3   0
            QNY   0   0   1
            QXO   0   5   0
            SYD  11   0   0

P.S

As @Andre Elrico mentions, for some reason you declared your amount variable as a string. You need to convert to integer prior to calculating the sums

Proposed Solution:

xtabs(as.integer(amount) ~ airport.arrival + airport.departure, df)

Upvotes: 3

Related Questions