Dhruv Ghulati
Dhruv Ghulati

Reputation: 3026

Combining columns of a table based on age range

I have a table in R that looks like (below is just a sample):

|       | 15 | 17 | 18 | 22 | 25 | 26 | 27 | 29 | 
|-------|----|----|----|----|----|----|----|----|
| 10000 | 1  | 2  | 1  | 2  | 4  | 3  | 5  | 2  |
| 20000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |
| 30000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |
| 40000 | 0  | 0  | 0  | 1  | 2  | 3  | 6  | 3  |
| 50000 | 0  | 0  | 0  | 0  | 0  | 0  | 1  | 1  |
| 60000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |

The rows are income levels, and the columns are age levels. I am essentially creating this table to see if age is related to income via a Chi-squared test. The numbers in the table are numbers of occurrences e.g. There are 2 people aged 17 in my dataset with income of 10000.

Both age and income level of type "num" in R so are continuous.

I want to essentially combine the columns for age so that I get a table with everyone who has income of 10k and is between age 15-25, age 25-35, etc. so I end up with much fewer columns.

Note also that colnames(tbl) = "15","17", "18", not "Age" - I haven't defined an overarching name for my columns and rows.

I note this answer does something similar but not sure how to apply it given I don't have a name for my columns e.g. "mpg" (in the case of the link).

Any ideas?

Upvotes: 0

Views: 319

Answers (1)

Nightwriter
Nightwriter

Reputation: 524

Made my own matrix here, but should work for df's aswell.

mat <- matrix(sample(1:10,8500,replace = TRUE),ncol=85)
colnames(mat) <- 15:99
levs <- cut(as.numeric(colnames(mat)),seq(15,105,10),right = FALSE)
res <- sapply(as.character(unique(levs)),function(x)rowSums(mat[,levs==x]))

Edit: If you want the same colnames as in mat, but counts according to the category, in addition do:

res <- res[,levs] # expands the res df to one category count col pr. original col in mat.
colnames(res) <- colnames(mat) # renames cols to reflect input matrix mat.

Upvotes: 1

Related Questions