user2741700
user2741700

Reputation: 901

Data Frame creating column from existing column only takes into account the first row

I have a dataframe like this

head(test)
                     sku               array                 
1 AQ665ELABLKLANID-81796       0,0,0,1,1,1,2            
2 AQ665ELABLKMANID-81797   2,0,0,0,1,1,0,0,1              
3 AQ665ELABLKNANID-81798     0,1,2,1,1,0,4,1           
4 AQ665ELABLKOANID-81799             0,1,0,1            
5 AQ665ELABLKPANID-81800     1,4,4,2,3,7,2,2             
6 AQ665ELABLKRANID-81802             0,1,1,0            

And I would like to add a column named first that contains for each row the first element of array:

test$first = strsplit(test$array,",")[[1]][1]

But what I get is the following :

head(test)
                     sku               array   first             
1 AQ665ELABLKLANID-81796       0,0,0,1,1,1,2   0            
2 AQ665ELABLKMANID-81797   2,0,0,0,1,1,0,0,1   0              
3 AQ665ELABLKNANID-81798     0,1,2,1,1,0,4,1   0           
4 AQ665ELABLKOANID-81799             0,1,0,1   0            
5 AQ665ELABLKPANID-81800     1,4,4,2,3,7,2,2   0             
6 AQ665ELABLKRANID-81802             0,1,1,0   0 

I dont understand why all the rows get the value only from the array of the first row

Upvotes: 0

Views: 47

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

I suppose some regex could also be of use here. Something along the lines of the following might come in handy:

gsub("(^[0-9]+)(,.*)", "\\1", test$array)
# [1] "0" "2" "0" "0" "1" "0"
gsub("(^.*?),(.*)", "\\1", test$array, perl=TRUE)
# [1] "0" "2" "0" "0" "1" "0"

There are some packages (like "stringi" and "stringr") that make this kind of stuff easier to do.

library(stringi)
stri_extract_first_regex(test$array, pattern="[0-9]+")
# [1] "0" "2" "0" "0" "1" "0"

This also lets you easily extract the last value with:

stri_extract_last_regex(test$array, pattern="[0-9]+")
# [1] "2" "1" "1" "1" "2" "0"

Upvotes: 1

thelatemail
thelatemail

Reputation: 93813

I think you actually want:

test$first <- sapply(strsplit(test$array,","),"[",1)
test

#                     sku             array first
#1 AQ665ELABLKLANID-81796     0,0,0,1,1,1,2     0
#2 AQ665ELABLKMANID-81797 2,0,0,0,1,1,0,0,1     2
#3 AQ665ELABLKNANID-81798   0,1,2,1,1,0,4,1     0
#4 AQ665ELABLKOANID-81799           0,1,0,1     0
#5 AQ665ELABLKPANID-81800   1,4,4,2,3,7,2,2     1
#6 AQ665ELABLKRANID-81802           0,1,1,0     0

In your attempt,

strsplit(test$array,",")[[1]]

gives you the split-apart version of test$array[1], from which you then subset the first element, which happens to be 0. Hence, all your values end up being 0.

Upvotes: 2

Related Questions