prao
prao

Reputation: 93

Working with grouped data in R

If I have a dataset in R which looks like this below:

ProjName,ProjLevel,Budget
Proj1,Hardware,$100
,Software,$100
,Total,$200
Proj2,Hardware,$200
,Software,$300
,Other,$400
,Total,$900
...
...

And I want the data to look like this:

ProjName,ProjLevel,Budget
Proj1,Hardware,$100
Proj1,Software,$100
Proj1,Total,$200
Proj2,Hardware,$200
Proj2,Software,$300
Proj2,Other,$400
Proj2,Total,$900

I am not sure if this is the best approach. Ideally, I would like the "Total" budget amounts for each project. I thought maybe I can use the apply() on the column but not sure the best way to approach the task.

Thanks, prao

Upvotes: 3

Views: 385

Answers (2)

joran
joran

Reputation: 173577

If you read in the data using na.strings = "", you can then use the na.locf function in the zoo package to fill in the missing values with the most recent non-missing one:

my.table <-read.table(text="ProjName,ProjLevel,Budget
+ Proj1,Hardware,$100
+ ,Software,$100
+ ,Total,$200
+ Proj2,Hardware,$200
+ ,Software,$300
+ ,Other,$400
+ ,Total,$900", header=TRUE, as.is=TRUE,sep = ",",na.strings = "")

my.table$ProjName <- na.locf(my.table$ProjName)
> my.table
  ProjName ProjLevel Budget
1    Proj1  Hardware   $100
2    Proj1  Software   $100
3    Proj1     Total   $200
4    Proj2  Hardware   $200
5    Proj2  Software   $300
6    Proj2     Other   $400
7    Proj2     Total   $900

Upvotes: 3

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

Without using additional packages, I would use a loop and replace empty cells with the previous non-empty cell

my.table <-read.table(text="ProjName,ProjLevel,Budget
Proj1,Hardware,$100
,Software,$100
,Total,$200
Proj2,Hardware,$200
,Software,$300
,Other,$400
,Total,$900", header=TRUE, as.is=TRUE,sep = ",")

for (i in 1:nrow(my.table)){
if(my.table[i,1]=="") my.table[i,1] <-my.table[i-1,1]
}

gives:

> my.table
  ProjName ProjLevel Budget
1    Proj1  Hardware   $100
2    Proj1  Software   $100
3    Proj1     Total   $200
4    Proj2  Hardware   $200
5    Proj2  Software   $300
6    Proj2     Other   $400
7    Proj2     Total   $900

To get the total amount by project:

my.table[my.table$ProjLevel=="Total",]

  ProjName ProjLevel Budget
3    Proj1     Total   $200
7    Proj2     Total   $900

Upvotes: 3

Related Questions