Reputation: 3
I'm new in programming in R, and I've been having this problem for several days now. I started with a list, I created from splitting a file. This list contains a number of items in a single row.
head(sales2)
> $`7143443`
>>[1] "SSS-U-CCXVCSS1" "L-CCX-8GETTS-LIC"
>$`7208993`
>>[1] "NFFGSR4=" "1MV-FT-1=" "VI-NT/TE="
>$`7241758`
>>[1] "PW_SQSGG="
>$`9273628`
>>[1] "O1941-SE9" "CCO887VA-K9" "2901-SEC/K9" "CO1941-C/K9"
>$`9371709`
>>[1] "HGR__SASS=" "WWQTTB0S-L" "WS-RRRT48FP" "WTTTF24PS-L"
[5] "GEDQTT8TS-L" "WD-SRNS-2S-L"
>$`9830473`
>>[1] "SPA$FFSB0S"
I wanted it to convert into a data frame , I used
x<-do.call(rbind, lapply(sales2,data.frame))
It gets converted in the data frame ,but it converts like this
> head(x,6)
id
> 7143443.1 "SSS-U-CCXVCSS1"
> 7143443.2 "L-CCX-8GETTS-LIC"
> 7208993.1 "NFFGSR4="
> 7208993.2 "1MV-FT-1="
> 7208993.3 "VI-NT/TE="
> 7241758 "PW_SQSGG="
I want 7143443's all item in a single row not in multiple row
Through this I want to calculate how many rows contain 2 items together for example "WS-C2960S-48TS-L" , "WS-C2960S-24TS-L", these 2 elements are there in how many rows? You can also say probability of these over all other elements.
Upvotes: 0
Views: 1268
Reputation: 39154
I am not sure what is your final desired output. But the following script can convert your list to a data frame. Perhaps you can begin your analysis from this data frame.
# Create example list
sales2 <- list(`7143443` = c("SSS-U-CCXVCSS1", "L-CCX-8GETTS-LIC"),
`7208993` = c("NFFGSR4=", "1MV-FT-1=", "VI-NT/TE="),
`7241758` = "PW_SQSGG=",
`9273628` = c("O1941-SE9", "CCO887VA-K9", "2901-SEC/K9", "CO1941-C/K9"),
`9371709` = c("HGR__SASS=", "WWQTTB0S-L", "WS-RRRT48FP", "WTTTF24PS-L",
"GEDQTT8TS-L", "WD-SRNS-2S-L"),
`9830473` = "SPA$FFSB0S")
# Load packages
library(dplyr)
library(purrr)
dat <- map(sales2, data_frame) %>% # Convert each list element to a data frame
bind_rows(.id = "ID") %>% # Combine all data frame
rename(Value = `.x[[i]]`) %>% # Change the name of the second column
group_by(ID) %>% # Group by the first column
summarise(Value = paste0(Value, collapse = " ")) # Collapse the second column
dat
# A tibble: 6 × 2
ID Value
<chr> <chr>
1 7143443 SSS-U-CCXVCSS1 L-CCX-8GETTS-LIC
2 7208993 NFFGSR4= 1MV-FT-1= VI-NT/TE=
3 7241758 PW_SQSGG=
4 9273628 O1941-SE9 CCO887VA-K9 2901-SEC/K9 CO1941-C/K9
5 9371709 HGR__SASS= WWQTTB0S-L WS-RRRT48FP WTTTF24PS-L GEDQTT8TS-L WD-SRNS-2S-L
6 9830473 SPA$FFSB0S
After reading original poster's comment, I decided to update my solution, to count how many rows contain two specified string patterns.
Here one row
is a unique ID
. So I assume that the request can be rephrased to "How many IDs contain two specified string patterns?" If this is the case, I would prefer not to collapse all the observations. Because after collapsing all observations to from one ID per row, we need to develop a strategy to match the string, such as using the regular expression. I am not familiar with regular string, so I will leave this for others to provide solutions.
In addition, the original poster did not specify which two strings are the targeted, so I would develop a strategy that the users can replace the targeted string case by case.
dat <- map(sales2, data_frame) %>% # Convert each list element to a data frame
bind_rows(.id = "ID") %>% # Combine all data frame
rename(Value = `.x[[i]]`) # Change the name of the second column
# After this, there is no need to collapse the rows
# Set the target string, User can change the strings here
target_string1 <- c("SSS-U-CCXVCSS1", "L-CCX-8GETTS-LIC")
dat2 <- dat %>%
filter(Value %in% target_string1) %>% # Filter rows matching the targeted string
distinct(ID, Value, .keep_all = TRUE) %>% # Only keep one row if ID and Value have exact duplicated
count(ID) %>% # Count how many rows per ID
filter(n > 1) %>% # Keep only ID that the Count number is larger than 1
select(ID)
dat2
# A tibble: 1 × 1
ID
<chr>
1 7143443
Upvotes: 1