Reputation: 221
I have a table, where data is stored in a following pattern:
ID Account
1 A
1 B
2 B
3 A
4 A
4 B
I would like to get one row per ID in a following way:
ID Account_A Account_B
1 1 1
2 0 1
3 1 0
4 1 1
0s and 1s don't matter that much, can be As, Bs etc. The main problem is to get data into one row per ID so I can merge it with another table.
I am not too familiar with R, so simpler solutions without too many new libraries are preferred.
Upvotes: 1
Views: 938
Reputation: 56239
We need table()
, then prettify to match the expected output.
Note: I am using new example data to match OP's request regarding duplicated rows.
PrzeM commented: One more question - is it possible to combine this code with counting number of occurences? In another data frame I have a similar layout, but for given example Account "A" can occur multiple times for same ID, then I would like to get a similar summary, but with a result similar to using COUNT() function in Excel.
# based on OP comments new example data
df1 <- read.table(text = "ID Account
1 A
1 A
9 B
9 B
3 A
4 A
4 B", header = TRUE)
# get table and convert to a dataframe
myTable <- table(df1)
res <- as.data.frame.matrix(myTable)
res
# A B
# 1 2 0
# 3 1 0
# 4 1 1
# 9 0 2
# update column names for Accounts
colnames(res) <- paste("Account", colnames(res), sep = "_")
# ID is in rownames, add it back as new column
res$ID <- rownames(res)
# reorder columns to get desired output
res <- res[, c(3, 1, 2)]
rownames(res) <- NULL
res
# ID Account_A Account_B
# 1 1 2 0
# 2 3 1 0
# 3 4 1 1
# 4 9 0 2
Upvotes: 1
Reputation: 81733
Here is a solution with the base R function reshape
.
The data:
dat <- data.frame(ID = c(1, 1, 2, 3, 4, 4), Account = c("A", "B", "B", "A", "A", "B"))
Transform the data to the wide format with reshape
.
dat_wide <- reshape(dat, direction = "wide", v.names = "Account", idvar = "ID",
timevar = "Account", sep = "_")
dat_wide
# ID Account_A Account_B
# 1 1 A B
# 3 2 <NA> B
# 4 3 A <NA>
# 5 4 A B
The values can be replaced with 1
and 0
:
dat_wide[-1] <- as.integer(!is.na(dat_wide[-1]))
# ID Account_A Account_B
# 1 1 1 1
# 3 2 0 1
# 4 3 1 0
# 5 4 1 1
Upvotes: 2