Reputation: 25
I have a data frame like this, where column 1 has different values and ends with a country.
Col1 col2 Col3 col4
A 0 0 1
B 1 0 3
c 4 0 6
D 5 6 7
China na na na
A 0 1 3
B 2 4 5
C 3 5 6
D 1 2 3
E 5 3 3
England na na na
I want to achieve something like the following, so I need to create a new column with the country name manipulated until the row where the country name is is matched.
Col1 col2 Col3 col4 col5
A 0 0 1 China
B 1 0 3 china
c 4 0 6 china
D 5 6 7 china
China na na na china
A 0 1 3 England
B 2 4 5 England
C 3 5 6 England
D 1 2 3 England
E 5 3 3 England
England na na na England
...plus I have 40 other countries to work with. I am new to R and struggling with on how to achieve my desired result.
Upvotes: 2
Views: 109
Reputation: 887088
We can use replace
with na.locf
library(zoo)
df1$Col5 <- with(df1, na.locf(replace(Col1, nchar(Col1)==1, NA), fromLast=TRUE))
df1$Cpl5
#[1] "China" "China" "China" "China" "China" "England" "England"
#[8] "England" "England" "England" "England"
Or another option is data.table
library(data.table)
setDT(df1)[, Col5 := Col1[.N], cumsum(shift(nchar(Col1)>1, fill = TRUE))]
df1
# Col1 col2 Col3 col4 Col5
# 1: A 0 0 1 China
# 2: B 1 0 3 China
# 3: c 4 0 6 China
# 4: D 5 6 7 China
# 5: China na na na China
# 6: A 0 1 3 England
# 7: B 2 4 5 England
# 8: C 3 5 6 England
# 9: D 1 2 3 England
#10: E 5 3 3 England
#11: England na na na England
Upvotes: 0
Reputation: 2076
Another hacky method: Using Col1
as starting point, make all the observations having a single character NA
. Then fill the NA
s with the last value.
library(zoo)
df$col5 = df$Col1
df$col5 = ifelse(nchar(as.character(df$col5)) > 1, as.character(df$col5), NA)
df$col5 = na.locf(as.character(df$col5), fromLast = TRUE)
Upvotes: 1
Reputation: 43334
A tidyverse option:
library(tidyverse)
# convert all to appropriate types
df %>% mutate_all(as.character) %>% type_convert(na = 'na') %>%
# add column with Col1 value if a row only has one non-NA value, else NA
mutate(country = ifelse(rowSums(!is.na(.)) == 1, Col1, NA)) %>%
fill(country, .direction = 'up') %>% # replace NAs upwards with last non-NA value
filter(complete.cases(.)) # subset to rows with no NAs
## Col1 col2 Col3 col4 country
## 1 A 0 0 1 China
## 2 B 1 0 3 China
## 3 c 4 0 6 China
## 4 D 5 6 7 China
## 5 A 0 1 3 England
## 6 B 2 4 5 England
## 7 C 3 5 6 England
## 8 D 1 2 3 England
## 9 E 5 3 3 England
Upvotes: 1
Reputation: 10437
Start by creating the data structure (in the future you should provide code for doing this in your questions).
exd <- read.table(text = "Col1 col2 Col3 col4
A 0 0 1
B 1 0 3
c 4 0 6
D 5 6 7
China NA NA NA
A 0 1 3
B 2 4 5
C 3 5 6
D 1 2 3
E 5 3 3
England NA NA NA", header = TRUE)
Next, figure out which rows are the country boundaries, and extract the country names
country_boundary <- nchar(as.character(exd$Col1)) > 1
country_names <- exd$Col1[country_boundary]
Then generate codes for each country, shifting by one.
country_number <- c(0, cumsum(country_boundary)[-nrow(exd)])
Finally, use the country_number
and country_names
vectors to create the country ID column.
exd <- data.frame(exd,
Col5 = factor(country_number,
labels = country_names))
You may wish to remove the (now redundant) boundary rows:
exd <- exd[!country_boundary, ]
exd
# Col1 col2 Col3 col4 Col5
# 1 A 0 0 1 China
# 2 B 1 0 3 China
# 3 c 4 0 6 China
# 4 D 5 6 7 China
# 6 A 0 1 3 England
# 7 B 2 4 5 England
# 8 C 3 5 6 England
# 9 D 1 2 3 England
# 10 E 5 3 3 England
Upvotes: 3