Reputation: 1336
I need your help to solve the end of my script.
The problem is that I need to obtain from this table
ID CITY
105274551 Roma
104746182 Roma
105486829 Roma
105479022 Roma
105307655 Roma
105384177 Roma
105384177 Milano
105384177 Reggio
105486831 Aosta
105384177 Milano
105486833 Milano
105384177 Napoli
105486835 Verona
105486836 Firenze
105307655 Genova
105307655 Firenze
this table:
ID CITY_1 CITY_2 CITY_3 CITY_4 CITY_5
105274551 Roma 0 0 0 0
104746182 Roma 0 0 0 0
105486829 Roma 0 0 0 0
105479022 Roma 0 0 0 0
105307655 Roma Genova Firenze 0 0
105384177 Roma Milano Reggio Napoli 0
105486831 Aosta 0 0 0 0
105486833 Milano 0 0 0 0
105486835 Verona 0 0 0 0
105486836 Firenze 0 0 0 0
My goal is to create many columns as much as there are values in the city column.
I'd like to know if is there some library or some tutorial/answer that could help me to create this type of script.
Thank you in advance.
Francesco
Upvotes: 1
Views: 38
Reputation: 887901
We can use dcast
from data.table
library(data.table)
dcast(setDT(df), ID ~ paste0("CITY_", rowid(ID)), value.var = 'CITY')
# ID CITY_1 CITY_2 CITY_3 CITY_4 CITY_5
# 1: 104746182 Roma <NA> <NA> <NA> <NA>
# 2: 105274551 Roma <NA> <NA> <NA> <NA>
# 3: 105307655 Roma Genova Firenze <NA> <NA>
# 4: 105384177 Roma Milano Reggio Milano Napoli
# 5: 105479022 Roma <NA> <NA> <NA> <NA>
# 6: 105486829 Roma <NA> <NA> <NA> <NA>
# 7: 105486831 Aosta <NA> <NA> <NA> <NA>
# 8: 105486833 Milano <NA> <NA> <NA> <NA>
# 9: 105486835 Verona <NA> <NA> <NA> <NA>
#10: 105486836 Firenze <NA> <NA> <NA> <NA>
df <- structure(list(ID = c(105274551L, 104746182L, 105486829L, 105479022L,
105307655L, 105384177L, 105384177L, 105384177L, 105486831L, 105384177L,
105486833L, 105384177L, 105486835L, 105486836L, 105307655L, 105307655L
), CITY = c("Roma", "Roma", "Roma", "Roma", "Roma", "Roma", "Milano",
"Reggio", "Aosta", "Milano", "Milano", "Napoli", "Verona", "Firenze",
"Genova", "Firenze")), row.names = c(NA, -16L), class = "data.frame")
Upvotes: 1
Reputation: 345
import pandas as pd
#taken sample records only
dict1={'ID':[105307655,105479022,105307655,105307655,105486833,105307655],'CITY':[ 'Roma','Roma','Milano','Reggio','Milano','Napoli']}
df=pd.DataFrame(dict1)
dictfinal={}
for i in range(0,len(dict1['ID'])):
key=dict1['ID'][i]
if ID as key is already present in dictionary just append the city to value
if(key in dictfinal.keys()):
dictfinal[key].append(dict1['CITY'][i])
#if ID is not present in dictionary ,then first add ID as key to dictionary and the add value as list to dictionary.
adding value as list because we can append more cities to list
else:
dictfinal[key]=[dict1['CITY'][i]]
#printing final dictionary
for k,v in dictfinal.items():
print(k,v)
105307655 ['Roma', 'Milano', 'Reggio', 'Napoli']
105479022 ['Roma']
105486833 ['Milano']
Upvotes: 0
Reputation: 389275
We can create a new column with "CITY" and its row number (col
) for each ID
and then get data into wide format.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(col = paste0("CITY_", row_number())) %>%
tidyr::pivot_wider(names_from = col, values_from = CITY)
# ID CITY_1 CITY_2 CITY_3 CITY_4 CITY_5
# <int> <fct> <fct> <fct> <fct> <fct>
# 1 105274551 Roma NA NA NA NA
# 2 104746182 Roma NA NA NA NA
# 3 105486829 Roma NA NA NA NA
# 4 105479022 Roma NA NA NA NA
# 5 105307655 Roma Genova Firenze NA NA
# 6 105384177 Roma Milano Reggio Milano Napoli
# 7 105486831 Aosta NA NA NA NA
# 8 105486833 Milano NA NA NA NA
# 9 105486835 Verona NA NA NA NA
#10 105486836 Firenze NA NA NA NA
This would give NA's for missing values (which I would recommend), if you need 0's add values_fill = list(CITY = 0))
in pivot_wider
.
data
df <- structure(list(ID = c(105274551L, 104746182L, 105486829L, 105479022L,
105307655L, 105384177L, 105384177L, 105384177L, 105486831L, 105384177L,
105486833L, 105384177L, 105486835L, 105486836L, 105307655L, 105307655L
), CITY = c("Roma", "Roma", "Roma", "Roma", "Roma", "Roma", "Milano",
"Reggio", "Aosta", "Milano", "Milano", "Napoli", "Verona", "Firenze",
"Genova", "Firenze")), row.names = c(NA, -16L), class = "data.frame")
Upvotes: 1