Reputation: 13
I have a data frame with only one column which I would like to extract certain items and turn them into a separate column to be used as a label for the other remaining items. It's a bit hard to explain, if for example what I have is:
pd.DataFrame({'Fruits': ['Apple', 'Gala', 'Fuji', 'Grannysmith', 'Honeycrisp', 'Golden', 'pink', 'Orange', 'blood orange', 'Mandrin', 'Tangerine', 'Clementine', 'Banana', 'baby', 'manzano', 'burro']})
Fruits
0 Apple
1 Gala
2 Fuji
3 Grannysmith
4 Honeycrisp
5 Golden
6 pink
7 Orange
8 blood orange
9 Mandrin
10 Tangerine
11 Clementine
12 Banana
13 baby
14 manzano
15 burro
But what I'm trying to convert it into is:
Fruits Types
0 Apple Gala
1 Apple Fuji
2 Apple Grannysmith
3 Apple Honeycrisp
4 Apple Golden
5 Apple pink
6 Orange blood orange
7 Orange Mandrin
8 Orange Tangerine
9 Orange Clementine
10 Banana baby
11 Banana manzano
12 Banana burro
How would I transform the 1st data frame into the 2nd? I feel stump by this, especially when there can be many categories of fruit and their respective types.
Upvotes: 1
Views: 99
Reputation: 294488
I'd build a dictionary mapping with some standard logic then use it with Pandas operations
fruit_classes = ['Apple', 'Orange', 'Banana']
last_class = None
fruit_map = {}
for fruit in df.Fruits:
if fruit in fruit_classes:
last_class = fruit
elif last_class is not None:
fruit_map[fruit] = last_class
df.assign(Types=df.Fruits, Fruits=df.Fruits.map(fruit_map)).dropna()
Fruits Types
1 Apple Gala
2 Apple Fuji
3 Apple Grannysmith
4 Apple Honeycrisp
5 Apple Golden
6 Apple pink
8 Orange blood orange
9 Orange Mandrin
10 Orange Tangerine
11 Orange Clementine
13 Banana baby
14 Banana manzano
15 Banana burro
Alternatively with a comprehension
fruit_classes = ['Apple', 'Orange', 'Banana']
fruit_classes = ['Apple', 'Orange', 'Banana']
pd.DataFrame(
[[x, None][::(x in fruit_classes) * 2 - 1] for x in df.Fruits],
columns=['Fruits', 'Types']
).assign(Fruits=lambda d: d.Fruits.ffill()).dropna()
Fruits Types
1 Apple Gala
2 Apple Fuji
3 Apple Grannysmith
4 Apple Honeycrisp
5 Apple Golden
6 Apple pink
8 Orange blood orange
9 Orange Mandrin
10 Orange Tangerine
11 Orange Clementine
13 Banana baby
14 Banana manzano
15 Banana burro
Upvotes: 1
Reputation: 863156
First is necessary define fruits in list, then create new column with repeating fruits by where
with isin
for missing values and forward filling, then remove same values in both columns by boolean indexing
and last set new columns names:
L = ['Apple','Orange','Banana']
df['a'] = df['Fruits'].where(df['Fruits'].isin(L)).ffill()
df = df.loc[df['a'] != df['Fruits'], ['a','Fruits']]
df.columns = ['Fruits','Types']
print (df)
Fruits Types
1 Apple Gala
2 Apple Fuji
3 Apple Grannysmith
4 Apple Honeycrisp
5 Apple Golden
6 Apple pink
8 Orange blood orange
9 Orange Mandrin
10 Orange Tangerine
11 Orange Clementine
13 Banana baby
14 Banana manzano
15 Banana burro
Upvotes: 1