Earl Mascetti
Earl Mascetti

Reputation: 1336

Create columns from unique element from data.frame in R

I need your help to solve the end of my script.

The problem is that I need to obtain from this table

ID          CITY
105274551   Roma 
104746182   Roma 
105486829   Roma 
105479022   Roma
105307655   Roma 
105384177   Roma 
105384177   Milano 
105384177   Reggio 
105486831   Aosta 
105384177   Milano 
105486833   Milano
105384177   Napoli 
105486835   Verona 
105486836   Firenze 
105307655   Genova
105307655   Firenze

this table:

   ID       CITY_1  CITY_2  CITY_3  CITY_4  CITY_5
105274551    Roma     0      0        0      0
104746182    Roma     0      0        0      0
105486829    Roma     0      0        0      0
105479022    Roma     0      0        0      0
105307655    Roma   Genova  Firenze   0      0
105384177    Roma   Milano  Reggio  Napoli   0
105486831    Aosta    0      0        0      0
105486833    Milano   0      0        0      0
105486835    Verona   0      0        0      0
105486836   Firenze   0      0        0      0

My goal is to create many columns as much as there are values in the city column.

I'd like to know if is there some library or some tutorial/answer that could help me to create this type of script.

Thank you in advance.

Francesco

Upvotes: 1

Views: 38

Answers (3)

akrun
akrun

Reputation: 887951

We can use dcast from data.table

library(data.table)
dcast(setDT(df), ID ~ paste0("CITY_", rowid(ID)), value.var = 'CITY')
#          ID  CITY_1 CITY_2  CITY_3 CITY_4 CITY_5
# 1: 104746182    Roma   <NA>    <NA>   <NA>   <NA>
# 2: 105274551    Roma   <NA>    <NA>   <NA>   <NA>
# 3: 105307655    Roma Genova Firenze   <NA>   <NA>
# 4: 105384177    Roma Milano  Reggio Milano Napoli
# 5: 105479022    Roma   <NA>    <NA>   <NA>   <NA>
# 6: 105486829    Roma   <NA>    <NA>   <NA>   <NA>
# 7: 105486831   Aosta   <NA>    <NA>   <NA>   <NA>
# 8: 105486833  Milano   <NA>    <NA>   <NA>   <NA>
# 9: 105486835  Verona   <NA>    <NA>   <NA>   <NA>
#10: 105486836 Firenze   <NA>    <NA>   <NA>   <NA>

data

df <- structure(list(ID = c(105274551L, 104746182L, 105486829L, 105479022L, 
105307655L, 105384177L, 105384177L, 105384177L, 105486831L, 105384177L, 
105486833L, 105384177L, 105486835L, 105486836L, 105307655L, 105307655L
), CITY = c("Roma", "Roma", "Roma", "Roma", "Roma", "Roma", "Milano", 
"Reggio", "Aosta", "Milano", "Milano", "Napoli", "Verona", "Firenze", 
"Genova", "Firenze")), row.names = c(NA, -16L), class = "data.frame")

Upvotes: 1

SRG
SRG

Reputation: 345

Using dictionary you can map ID as dictionary key and append cities of same IDs as values

import pandas as pd
#taken sample records only
dict1={'ID':[105307655,105479022,105307655,105307655,105486833,105307655],'CITY':[ 'Roma','Roma','Milano','Reggio','Milano','Napoli']}
df=pd.DataFrame(dict1)


dictfinal={}

for i in range(0,len(dict1['ID'])):
    key=dict1['ID'][i]
    if ID as key is already present in dictionary just append the city to value
    if(key in dictfinal.keys()):     

        dictfinal[key].append(dict1['CITY'][i])
    #if ID is not present in dictionary ,then first add ID as key to dictionary and the add value as list to dictionary.
    adding value as list because we can append more cities to list
    else:

        dictfinal[key]=[dict1['CITY'][i]]

#printing final dictionary

for k,v in dictfinal.items():
    print(k,v)

output

105307655 ['Roma', 'Milano', 'Reggio', 'Napoli']
105479022 ['Roma']
105486833 ['Milano']

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389325

We can create a new column with "CITY" and its row number (col) for each ID and then get data into wide format.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(col = paste0("CITY_", row_number())) %>%
  tidyr::pivot_wider(names_from = col, values_from = CITY)

#          ID CITY_1  CITY_2 CITY_3  CITY_4 CITY_5
#       <int> <fct>   <fct>  <fct>   <fct>  <fct> 
# 1 105274551 Roma    NA     NA      NA     NA    
# 2 104746182 Roma    NA     NA      NA     NA    
# 3 105486829 Roma    NA     NA      NA     NA    
# 4 105479022 Roma    NA     NA      NA     NA    
# 5 105307655 Roma    Genova Firenze NA     NA    
# 6 105384177 Roma    Milano Reggio  Milano Napoli
# 7 105486831 Aosta   NA     NA      NA     NA    
# 8 105486833 Milano  NA     NA      NA     NA    
# 9 105486835 Verona  NA     NA      NA     NA    
#10 105486836 Firenze NA     NA      NA     NA    

This would give NA's for missing values (which I would recommend), if you need 0's add values_fill = list(CITY = 0)) in pivot_wider.

data

df <- structure(list(ID = c(105274551L, 104746182L, 105486829L, 105479022L, 
105307655L, 105384177L, 105384177L, 105384177L, 105486831L, 105384177L, 
105486833L, 105384177L, 105486835L, 105486836L, 105307655L, 105307655L
), CITY = c("Roma", "Roma", "Roma", "Roma", "Roma", "Roma", "Milano", 
"Reggio", "Aosta", "Milano", "Milano", "Napoli", "Verona", "Firenze", 
"Genova", "Firenze")), row.names = c(NA, -16L), class = "data.frame")

Upvotes: 1

Related Questions