Vijay Vaidyanathan
Vijay Vaidyanathan

Reputation: 25

Apply function on specific columns in R

I have a csv file

this file contains data as below –

category_list,Automotive & Sports,Blanks,Cleantech / Semiconductors,Entertainment,Health,Manufacturing,"News, Search and Messaging",Others,"Social, Finance, Analytics, Advertising"
,0,1,0,0,0,0,0,0,0
3D,0,0,0,0,0,1,0,0,0
3D Printing,0,0,0,0,0,1,0,0,0
3D Technology,0,0,0,0,0,1,0,0,0
Accounting,0,0,0,0,0,0,0,0,1
Active Lifestyle,0,0,0,0,1,0,0,0,0
Ad Targeting,0,0,0,0,0,0,0,0,1
Advanced Materials,0,0,0,0,0,1,0,0,0
Adventure Travel,1,0,0,0,0,0,0,0,0

On loading it into mapping data frame ...

mapping <- read.csv(file="mapping.csv", stringsAsFactors = FALSE,sep=",",check.names=FALSE)

data looks as below (as expected)- enter image description here

I am trying to create a new column in this file, which will have the column name which has a 1 against a particular row. For example, for 3D row, the additional column should get the value of “Manufacturing”. There can be only one "1" against each row.

When I run this command –

mapping$sector_names <- lapply(apply(mapping[2:9], 1, function(x) which(x=="1")),names)

its populating the sector names column correctly. As shown below –

enter image description here

The problem is that when I use the apply function against columns 2 thru 10, its not working, getting values NULL in sector_names in this case –

mapping$sector_names <- lapply(apply(mapping[2:10], 1, function(x) which(x=="1")),names)

enter image description here

The strange thing is that when I use the apply function against, columns 3 thru 10, it works fine…

enter image description here

In short – the question is that when I apply the “Apply” function across columns 2 thru 10, its not working, but any other combination (2 thru 9 or 3 thru 10 etc.) works.

The problem is that the apply function returns column name along with the column number when I use 2 thru 9 but only returns column number when I use 2 thru 10

Ex : - output of apply(mapping[2:9], 1, function(x) which(x=="1")) is like this for each row…

[[2]]
Blanks 
     8

Whereas for apply(mapping[2:10], 1, function(x) which(x=="1")) is like this for each row…

[[1]] 2

Could anyone please help?

Upvotes: 0

Views: 824

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269586

1) If a is the result of the apply in the question then just index the column names by it:

mapping$sector_names <- names(mapping)[-1][a]

2) Alternately define mapping1 to be the matrix which is the 0-1 part of mapping (i.e. all but first column) and nc1 to be its number of columns. Multiplying that matrix by the vector 1, 2, 3, ... will give a vector of column indexes of the 1's. Index the column names of mappping1 by that index vector. This involves no instances of apply commands.

mapping1 <- as.matrix(mapping[-1])
nc1 <- ncol(mapping1)
mapping$sector_names <- colnames(mapping1)[mapping1 %*% seq_len(nc1)]

This gives:

> mapping$sector
[1] "Blanks"                                 
[2] "Manufacturing"                          
[3] "Manufacturing"                          
[4] "Manufacturing"                          
[5] "Social, Finance, Analytics, Advertising"
[6] "Health"                                 
[7] "Social, Finance, Analytics, Advertising"
[8] "Manufacturing"                          
[9] "Automotive & Sports"     

Upvotes: 2

Related Questions