Reputation: 76

how to convert multiple columns in a data frame into a numpy array?

I've a DataFrame which contains the data for different students going to school. It has different columns like rank, major_code, major, unemployed etc.

I used df.values (in this case recent_grads.values) to Return a numpy representation of the DataFrame.

recent_grads_np = recent_grads.values
print(recent_grads_np)

This works as d.values changes the entire DataFrame into Numpy array. The result is this:

[[1 2419 'PETROLEUM ENGINEERING' ... 1534 364 193]
 [2 2416 'MINING AND MINERAL ENGINEERING' ... 350 257 50]
 [3 2415 'METALLURGICAL ENGINEERING' ... 456 176 0]
 ...
 [172 5203 'COUNSELING PSYCHOLOGY' ... 2403 1245 308]
 [173 3501 'LIBRARY SCIENCE' ... 288 338 192]]

How do I select only a few columns from the entire Dataframe and then convert into a numpy array ?

Upvotes: 5

Answers (2)

Felipe De Morais

Reputation: 11

I know this is an old post, but I also needed something similar, and I found the following approach. You can also use the apply function with lambda. Thus, you can create your own type of array and even convert the data, like this:

raw_converted_data = df[["A", "B", "C"]].apply(lambda x: [x["A"], x["B"], x["C"]], axis=1)

list(raw_converted_data)

In this example, the first line will return the data separated into an array. Then, in the second line, this data becomes a list.

I hope this answer helps someone else also looking for this.

Upvotes: 0

OmidHM

Reputation: 51

You should easily use ".values" at the end of your Dataframe and it will give you your numpy array.

To select specific dataframe columns, you can say df[["A","B","C"]] where "A", "B", and "C" are your columns names.

So: df[["A","B","C"]].values

will give you what you asked for.

Upvotes: 4

how to convert multiple columns in a data frame into a numpy array?

Answers (2)

Related Questions