Reputation: 21
I have a data set like the subset below
col1 col2 col3
y
y
y y
y
y
y
In reality, my data set is describing drug overdoses. There is a "y" in the column for the drug that was in each persons system when they died. What I'm trying to do is change the "y" to the cooresponding column. for example, I want the example data set above to appear as
col1 col2 col3
col1
col2
col1 col2
col3
col2
col1
So that I can later combine these three columns to know what each person died of, but only looking in one column. Data set is called "Data1", so I know that I need at some point to put
Data1$col1
Data1$col2
Data1$col3
Someone please help.. I'm very new at this.
Upvotes: 1
Views: 95
Reputation: 5508
There are already some brillant answers, but I still write down here my solution inclusive the loop of columns:
df <- data.frame(col1 = c("y", NA, "y", NA, NA, "y"),
col2 = c(NA, "y", "y", NA, "y", NA),
col3 = c(NA, NA, NA, "y", NA, NA), stringsAsFactors=FALSE)
Notice: The stringAsFactors=FALSE
is important!
for (name in colnames(df)){
df[name][df[name]=="y"] = name
}
Then, you will get what you want:
> df
col1 col2 col3
1 col1 <NA> <NA>
2 <NA> col2 <NA>
3 col1 col2 <NA>
4 <NA> <NA> col3
5 <NA> col2 <NA>
6 col1 <NA> <NA>
Upvotes: 1
Reputation: 21
I was able to figure it out:
Data1$Col1 <- as.character(Data1$Col1)
Data1$Col1[Data1$Col1 %in% "y"] <- "Col1"
Thank you for all of your help!
Upvotes: 1
Reputation: 2290
If you add a column for a patient ID, you can reformat to a long format using data.table
's melt
function:
df <- data.frame(id = 1:6, col1 = c("y", NA, "y", NA, NA, "y"),
col2 = c(NA, "y", "y", NA, "y", NA),
col3 = c(NA, NA, NA, "y", NA, NA))
df1 <- melt(setDT(df), id.vars = "id", measure.vars = c("col1", "col2", "col3"))
df1 <- df1[order(id)][is.na(value) == FALSE]
then that will give you this:
> df1
id variable value
1: 1 col1 y
2: 2 col2 y
3: 3 col1 y
4: 3 col2 y
5: 4 col3 y
6: 5 col2 y
7: 6 col1 y
Upvotes: 2
Reputation: 409
In a column, you can check where values are and get back a vector of logicals that you next use to index said column. In our case:
Data1$col1 == "y"
would get you a vector like
## [1] TRUE FALSE TRUE FALSE FALSE TRUE
This you can use to access the "y"s in the column:
Data1$col1[Data1$col1 == "y"]
## [1] "y" "y" "y"
It reads out the entries of the column only where there is an "y", a bit boring in this case, but very useful in general, like when you use values in one column to manipulate another one. The line above is also a two-way-street: You can read out the values (like above), our you can use an assignment operator to assign the specified entries a new value:
Data1$col1[Data1$col1 == "y"] = "col1"
You can then repeat the code for each column and be done.
Upvotes: 1