Reputation: 59
I am struggling with multiple data-imputation packages in R and need your advice.
I have a data-set with 150.000 rows and 270 columns.
Every column has some missing data, but i need to make imputation only in 7 of them. I need all the columns to be considered during the filling of those 7 columns. No numerical data, only strings.
I have tried to use MICE, but it takes too long and do not gives any result because of the break. I believe I am coding it completely wrong.
A | B | C | D | E |
------------------------------------------------------------------------
DEEP DIGGING ALL | 1989 | Digging | Sumer | Cups |
SURFACE DIGGING | 1989 | N/A | Sumer | Glasses |
CLAIMS OFFSHORE | 1990 | N/A | Assyria | N/A |
OFFSHORE CLAIMS | 1990 | Offshore | Assyria | N/A |
CLAIMS OFFSHORE | 1990 | Offshore | Assyria | Cups |
OFFSHORE CLAIMS | 1990 | Offshore | Assyria | Cups |
What I am trying to get is the table, where the column "C" is imputed based on all of the columns, but N/As in column "E" are ignored.
Desirable result:
A | B | C | D | E |
------------------------------------------------------------------------
DEEP DIGGING ALL | 1989 | Digging | Sumer | Cups |
SURFACE DIGGING | 1989 | Digging | Sumer | Glasses |
CLAIMS OFFSHORE | 1990 | Offshore | Assyria | N/A |
OFFSHORE CLAIMS | 1990 | Offshore | Assyria | N/A |
CLAIMS ONSHORE | 1990 | Offshore | Assyria | Cups |
OFFSHORE CLAIMS | 1990 | Offshore | Assyria | Cups |
I'm not sure if the "MICE" is the good path to take, but I did not get anywhere with my attempts in "missForest". So I really depend on your help.
Many thanks in advance!
Upvotes: 0
Views: 205
Reputation: 886938
We can use fill
from tidyr
library(dplyr)
library(tidyr)
df1 %>%
group_by(B) %>%
fill(C, .direction = 'updown')
Upvotes: 1