Reputation: 601
I guess that this is one of the easiest tasks, but I have no idea how to do it. I have a data.frame with a lot of data, a receipt number with an account number to which it was expensed.
ACCOUNT RECEIPT
10000 2121
12000 1515
21000 2121
50200 1515
47500 1474
90000 1474
I now want to search for the first receipt number and list all account numbers in a new column. I am currently not sure if the result should also list the account of the current receip, because one can see it anyway.
ACCOUNT RECEIPT RESULT
10000 2121 21000
12000 1515 50200
21000 2121 10000
50200 1515 12000
47500 1474 90000, 140000
90000 1474 47500, 140000
140000 1474 47500, 90000
I really like to use dplyr, maybe it is already possible to do it with it, I just can't see the solution..
Upvotes: 0
Views: 28
Reputation: 16121
I don't think you need multiple rows for each RECEIPT
in your final table.
I'd propose this solution
library(dplyr)
dt = read.table(text="ACCOUNT RECEIPT
10000 2121
12000 1515
21000 2121
50200 1515
47500 1474
90000 1474", header=T)
dt %>%
group_by(RECEIPT) %>%
summarise(ALL_ACCOUNTS = paste(ACCOUNT, collapse = ", "))
# # A tibble: 3 x 2
# RECEIPT ALL_ACCOUNTS
# <int> <chr>
# 1 1474 47500, 90000
# 2 1515 12000, 50200
# 3 2121 10000, 21000
Where, as you mentioned, you get one row for each RECEIPT
value and then all corresponding ACCOUNT
values.
To achieve exactly what you mentioned in your question try this
library(dplyr)
dt = read.table(text="ACCOUNT RECEIPT
10000 2121
12000 1515
21000 2121
50200 1515
47500 1474
90000 1474
140000 1474", header=T)
dt %>%
left_join(dt, by="RECEIPT") %>% # join same dataset to get all combinations of accounts
filter(ACCOUNT.x != ACCOUNT.y) %>% # filter out cases with same account numbers
group_by(ACCOUNT.x, RECEIPT) %>% # group by pairs of first account number and receipt
summarise(REST_ACCOUNTS = paste(ACCOUNT.y, collapse = ", ")) %>% # combine rest of account numbers
ungroup() %>% # forget the grouping
arrange(RECEIPT) %>% # order by receipt (only if needed for better visualisation)
rename(ACCOUNT = ACCOUNT.x) # change the name (only if needed for better visualisation)
# # A tibble: 7 x 3
# ACCOUNT RECEIPT REST_ACCOUNTS
# <int> <int> <chr>
# 1 47500 1474 90000, 140000
# 2 90000 1474 47500, 140000
# 3 140000 1474 47500, 90000
# 4 12000 1515 50200
# 5 50200 1515 12000
# 6 10000 2121 21000
# 7 21000 2121 10000
Upvotes: 1