Reputation: 471
I have data that gets imported into a Pandas dataframe where that elements that are lists are being separated into new columns automatically. My data was originally .root files and I am importing them in to Pandas with Uproot
Below is example data where the column physics[0] and physics2 were originally elements of a list
data = {'physics[0]': [1,2,3], 'physics[1]': [4,5,6], 'yes': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)
physics[0] physics[1] yes no
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
I am trying to come up with a technique to detect similar column names and recreate the elements as list. Here is what I have so far:
lst = [col for col in df.columns if 'physics' in col]
df['physics']=df[lst].values.tolist()
yes no physics
0 7 10 [1, 4]
1 8 11 [2, 5]
2 9 12 [3, 6]
which works. I won't always know before hand what the column names will be when this happens. But I'd like to be able to detect if the names are similar automatically and perform the above list comprehension.
Upvotes: 1
Views: 695
Reputation: 1054
You can generalize your approach using regex:
import re
# create dictionary d of all groups of similar columns
multi_cols = filter(lambda x: re.search(r'\[[0-9]+\]$',x),df.columns)
d = {}
for c in multi_cols:
k = re.sub(r'\[[0-9]+\]$', '' , str(c))
if k not in d:
d[k] = []
d[k].append(c)
# the dictionary will be as following:
print(d)
# {'physics': ['physics[0]', 'physics[1]']}
# use dictionary d to combine all similar columns in each group
for k in d:
df[k] = df[d[k]].values.tolist()
Upvotes: 2
Reputation: 126
Can we assume that any duplicate columns will always contain [0] at the end ?
what about something like this -
data = {'physics[0]': [1,2,3], 'physics[1]': [4,5,6], 'yes': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)
duplicates = set([])
columns = df.columns
for c in columns:
if c.endswith('[0]') and c.replace('[0]', '') not in duplicates:
duplicates.add(c.replace('[0]', ''))
for d in duplicates:
lst = [col for col in df.columns if d in col]
df[d]=df[lst].values.tolist()
Upvotes: 0