Tapan
Tapan

Reputation: 3

Not able to convert a list into a data frame

I'm new in programming in R, and I've been having this problem for several days now. I started with a list, I created from splitting a file. This list contains a number of items in a single row.

head(sales2)
> $`7143443`
>>[1] "SSS-U-CCXVCSS1"   "L-CCX-8GETTS-LIC"

>$`7208993`
>>[1] "NFFGSR4="  "1MV-FT-1="  "VI-NT/TE=" 

>$`7241758`
>>[1] "PW_SQSGG="

>$`9273628`
>>[1] "O1941-SE9" "CCO887VA-K9"    "2901-SEC/K9" "CO1941-C/K9"

>$`9371709`
>>[1] "HGR__SASS=" "WWQTTB0S-L" "WS-RRRT48FP" "WTTTF24PS-L" 
[5] "GEDQTT8TS-L"  "WD-SRNS-2S-L" 

>$`9830473`
>>[1] "SPA$FFSB0S"

I wanted it to convert into a data frame , I used

x<-do.call(rbind, lapply(sales2,data.frame))

It gets converted in the data frame ,but it converts like this

> head(x,6)

                          id
> 7143443.1   "SSS-U-CCXVCSS1"

> 7143443.2   "L-CCX-8GETTS-LIC"

> 7208993.1    "NFFGSR4="  

> 7208993.2    "1MV-FT-1="  

> 7208993.3    "VI-NT/TE="

> 7241758      "PW_SQSGG="

I want 7143443's all item in a single row not in multiple row

Through this I want to calculate how many rows contain 2 items together for example "WS-C2960S-48TS-L" , "WS-C2960S-24TS-L", these 2 elements are there in how many rows? You can also say probability of these over all other elements.

Upvotes: 0

Views: 1268

Answers (1)

www
www

Reputation: 39154

I am not sure what is your final desired output. But the following script can convert your list to a data frame. Perhaps you can begin your analysis from this data frame.

# Create example list
sales2 <- list(`7143443` = c("SSS-U-CCXVCSS1", "L-CCX-8GETTS-LIC"),
            `7208993` = c("NFFGSR4=", "1MV-FT-1=", "VI-NT/TE="),
            `7241758` = "PW_SQSGG=",
            `9273628` = c("O1941-SE9", "CCO887VA-K9", "2901-SEC/K9", "CO1941-C/K9"),
            `9371709` = c("HGR__SASS=", "WWQTTB0S-L", "WS-RRRT48FP", "WTTTF24PS-L",
                          "GEDQTT8TS-L", "WD-SRNS-2S-L"),
            `9830473` = "SPA$FFSB0S")

# Load packages
library(dplyr)
library(purrr)

dat <- map(sales2, data_frame) %>%                    # Convert each list element to a data frame
  bind_rows(.id = "ID") %>%                           # Combine all data frame
  rename(Value = `.x[[i]]`) %>%                       # Change the name of the second column
  group_by(ID) %>%                                    # Group by the first column
  summarise(Value = paste0(Value, collapse = " "))    # Collapse the second column

dat
# A tibble: 6 × 2
       ID                                                                  Value
    <chr>                                                                  <chr>
1 7143443                                        SSS-U-CCXVCSS1 L-CCX-8GETTS-LIC
2 7208993                                           NFFGSR4= 1MV-FT-1= VI-NT/TE=
3 7241758                                                              PW_SQSGG=
4 9273628                          O1941-SE9 CCO887VA-K9 2901-SEC/K9 CO1941-C/K9
5 9371709 HGR__SASS= WWQTTB0S-L WS-RRRT48FP WTTTF24PS-L GEDQTT8TS-L WD-SRNS-2S-L
6 9830473                                                             SPA$FFSB0S

Update

After reading original poster's comment, I decided to update my solution, to count how many rows contain two specified string patterns.

Here one row is a unique ID. So I assume that the request can be rephrased to "How many IDs contain two specified string patterns?" If this is the case, I would prefer not to collapse all the observations. Because after collapsing all observations to from one ID per row, we need to develop a strategy to match the string, such as using the regular expression. I am not familiar with regular string, so I will leave this for others to provide solutions.

In addition, the original poster did not specify which two strings are the targeted, so I would develop a strategy that the users can replace the targeted string case by case.

dat <- map(sales2, data_frame) %>%                    # Convert each list element to a data frame
  bind_rows(.id = "ID") %>%                           # Combine all data frame
  rename(Value = `.x[[i]]`)                           # Change the name of the second column

# After this, there is no need to collapse the rows

# Set the target string, User can change the strings here
target_string1 <- c("SSS-U-CCXVCSS1", "L-CCX-8GETTS-LIC")       

dat2 <- dat %>%
  filter(Value %in% target_string1) %>%               # Filter rows matching the targeted string
  distinct(ID, Value, .keep_all = TRUE) %>%           # Only keep one row if ID and Value have exact duplicated
  count(ID) %>%                                       # Count how many rows per ID
  filter(n > 1) %>%                                   # Keep only ID that the Count number is larger than 1
  select(ID)

dat2

# A tibble: 1 × 1
       ID
    <chr>
1 7143443

Upvotes: 1

Related Questions