Reputation: 125
I have the following data frame:
a1 | a2 | a3 | a4
---------------------
Bob | Cat | Dov | Edd
Cat | Dov | Bob | Edd
Edd | Cat | Dov | Bob
and I want to convert it to
Bob | Cat | Dov | Edd
---------------------
a1 | a2 | a3 | a4
a3 | a1 | a2 | a4
a4 | a2 | a3 | a1
Note that the number of columns equals the number of unique values, and the number and order of rows are preserved
Upvotes: 7
Views: 788
Reputation: 29719
1) Required approach:
A faster implementation would be to sort the values of the dataframe and align the columns accordingly based on it's obtained indices after np.argsort
.
pd.DataFrame(df.columns[np.argsort(df.values)], df.index, np.unique(df.values))
Applying np.argsort
gives us the data we are looking for:
df.columns[np.argsort(df.values)]
Out[156]:
Index([['a1', 'a2', 'a3', 'a4'], ['a3', 'a1', 'a2', 'a4'],
['a4', 'a2', 'a3', 'a1']],
dtype='object')
2) Slow generalized approach:
More generalized approach while at the cost of some speed / efficiency would be to use apply
after creating a dict
mapping of the strings/values present in the dataframe with their corresponding column names.
Use a dataframe constructor later after converting the obtained series to their list
representation.
pd.DataFrame(df.apply(lambda s: dict(zip(pd.Series(s), pd.Series(s).index)), 1).tolist())
3) Faster generalized approach:
After obtaining a list of dictionaries from df.to_dict
+ orient='records'
, we need to swap it's respective key and value pairs while iterating through them in a loop.
pd.DataFrame([{val:key for key, val in d.items()} for d in df.to_dict('r')])
Sample test case:
df = df.assign(a5=['Foo', 'Bar', 'Baz'])
Both these approaches produce:
@piRSquared EDIT 1
generalized solution
def nic(df):
v = df.values
n, m = v.shape
u, inv = np.unique(v, return_inverse=1)
i = df.index.values
c = df.columns.values
r = np.empty((n, len(u)), dtype=c.dtype)
r[i.repeat(m), inv] = np.tile(c, n)
return pd.DataFrame(r, i, u)
1 I would like to thank user @piRSquared for coming up with a really fast and generalized numpy based alternative soln.
Upvotes: 9
Reputation: 294536
numpy
+ pandas
v = df.values
n, m = v.shape
i = df.index.values
c = df.columns.values
# create series with values that were column values
# create multi index with first level from existing index
# and second level from flattened existing values
# then unstack
pd.Series(
np.tile(c, n),
[i.repeat(m), v.ravel()]
).unstack()
Bob Cat Dov Edd
0 a1 a2 a3 a4
1 a3 a1 a2 a4
2 a4 a2 a3 a1
Upvotes: 1
Reputation: 215127
You can reshape it with stack and unstack with a swapping of the values and index:
df_swap = (df.stack() # reshape the data frame to long format
.reset_index(level = 1) # set the index(column headers) as a new column
.set_index(0, append=True) # set the values as index
.unstack(level=1)) # reshape the data frame to wide format
df_swap.columns = df_swap.columns.get_level_values(1) # drop level 0 in the column index
df_swap
Upvotes: 5