Erik Steiner
Erik Steiner

Reputation: 601

Write a search result of a data frame into a new column

I guess that this is one of the easiest tasks, but I have no idea how to do it. I have a data.frame with a lot of data, a receipt number with an account number to which it was expensed.

ACCOUNT RECEIPT
10000   2121
12000   1515
21000   2121
50200   1515
47500   1474
90000   1474

I now want to search for the first receipt number and list all account numbers in a new column. I am currently not sure if the result should also list the account of the current receip, because one can see it anyway.

ACCOUNT RECEIPT RESULT
  10000    2121 21000        
  12000    1515 50200
  21000    2121 10000
  50200    1515 12000
  47500    1474 90000, 140000
  90000    1474 47500, 140000
 140000    1474 47500, 90000

I really like to use dplyr, maybe it is already possible to do it with it, I just can't see the solution..

Upvotes: 0

Views: 28

Answers (1)

AntoniosK
AntoniosK

Reputation: 16121

I don't think you need multiple rows for each RECEIPT in your final table. I'd propose this solution

library(dplyr)

dt = read.table(text="ACCOUNT RECEIPT
10000   2121
12000   1515
21000   2121
50200   1515
47500   1474
90000   1474", header=T)

dt %>%
  group_by(RECEIPT) %>%
  summarise(ALL_ACCOUNTS = paste(ACCOUNT, collapse = ", "))

# # A tibble: 3 x 2
#   RECEIPT ALL_ACCOUNTS
#     <int>        <chr>
# 1    1474 47500, 90000
# 2    1515 12000, 50200
# 3    2121 10000, 21000

Where, as you mentioned, you get one row for each RECEIPT value and then all corresponding ACCOUNT values.

To achieve exactly what you mentioned in your question try this

library(dplyr)

dt = read.table(text="ACCOUNT RECEIPT
                10000   2121
                12000   1515
                21000   2121
                50200   1515
                47500   1474
                90000   1474
                140000  1474", header=T)

dt %>%
  left_join(dt, by="RECEIPT") %>%            # join same dataset to get all combinations of accounts
  filter(ACCOUNT.x != ACCOUNT.y) %>%         # filter out cases with same account numbers
  group_by(ACCOUNT.x, RECEIPT) %>%           # group by pairs of first account number and receipt
  summarise(REST_ACCOUNTS = paste(ACCOUNT.y, collapse = ", ")) %>%   # combine rest of account numbers
  ungroup() %>%                              # forget the grouping
  arrange(RECEIPT) %>%                       # order by receipt (only if needed for better visualisation)
  rename(ACCOUNT = ACCOUNT.x)                # change the name (only if needed for better visualisation)

# # A tibble: 7 x 3
#   ACCOUNT RECEIPT REST_ACCOUNTS
#     <int>   <int>         <chr>
# 1   47500    1474 90000, 140000
# 2   90000    1474 47500, 140000
# 3  140000    1474  47500, 90000
# 4   12000    1515         50200
# 5   50200    1515         12000
# 6   10000    2121         21000
# 7   21000    2121         10000

Upvotes: 1

Related Questions