hagewhy
hagewhy

Reputation: 79

R: Finding out the corresponding value for a category in a data frame

My original data has about 1000 observations and has the following variables.

$Nationality : Factor "American" "Korean" ...

$Food : Factor "Milk" "Fruits" "Rice"

$No. of servings : num 5 6 3

I wanted to construct a table, which shows for $Nationality == American, what is the $Food that they eat, and its corresponding $No. of servings.

Since my original data is huge, i tried to first subset the data using: American = subset(originaldata, $Nationality == "American"), to create a data frame which contain records of American nationality only.

Then i applied the table ( ) function on the subsetted data (i.e. American) using: table(American$Food, American$No. of servings)

The results, instead of just containing $Nationality == "American" records, had also contained all other Nationality records.

Why is this so? Is there any method to work around with this problem? I want a table which only contains records of Nationality == American, showing data on $Food and $No. of servings in two columns.

Upvotes: 1

Views: 719

Answers (3)

Bg1850
Bg1850

Reputation: 3082

For Large scale data use data.table . if I understand your problem correctly then it should be achievable by following

library(data.table)
dt= as.data.table(your_data)
dt[,.SD,Nationality]

with the data that @sotos provided it would look like

dt <- as.data.table(x)
> dt[,.SD,Nationality]
   Nationality   Food No.ofServings
1:    American Fruits             3
2:    American   rise             5
3:    American  pasta             9
4:      Korean   meat             6
5:     British Fruits             2

filtering is easy peasy

> dt[Nationality=="American"]
   Nationality   Food No.ofServings
1:    American Fruits             3
2:    American   rise             5
3:    American  pasta             9

Upvotes: 0

Alias
Alias

Reputation: 149

or with the dplyr package:

install.packages("dplyr")

library(dplyr)

AmericanData = filter(yourdata, Nationality == "American")

Upvotes: 0

Sotos
Sotos

Reputation: 51582

You can split your data by nationality and then extract 'American',

list1 <- split(originaldata, originaldata$Nationality)
list1$American
#  Nationality   Food No.ofServings
#1    American Fruits             3
#2    American   rise             5
#5    American  pasta             9

DATA

dput(originaldata)
structure(list(Nationality = structure(c(1L, 1L, 3L, 2L, 1L), .Label = c("American", 
"British", "Korean"), class = "factor"), Food = structure(c(1L, 
4L, 2L, 1L, 3L), .Label = c("Fruits", "meat", "pasta", "rise"
), class = "factor"), No.ofServings = c(3, 5, 6, 2, 9)), .Names = c("Nationality", 
"Food", "No.ofServings"), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 1

Related Questions