Numerical Data Type (num): Numerical values (e.g., 1, 523, 3.45) are used for calculation. In contrast, ZIP-Codes are not numerical data type.
Character Data Type (chr): Storing sequence of characters, numbers, and/or symbols to form a word or even a sentence is called a character data type (e.g. first or last names, street addresses, or Zip-codes)
Factor Data Type (factor): A factor is an R data type that stores categorical data in an effective way. factor data types are also required by many classification models in R.
Logic Data Type(logic): A data type that stores the logic states TRUE and FALSE is called a logic object (sometimes called Boolean)
Numerical Data Type (num): Numerical values are used for calculations (therefore ZIP-Codes are not numerical). Numerical data can be discrete (integer) or continuous (double).
A=as.integer(2)B=as.integer(3)str(A) # str() returns structure of a variable
int 2
C=1.23str(C) # str() returns structure of a variable
num 1.23
print(A*C)
[1] 2.46
A^B
[1] 8
A/B # Returns num type
[1] 0.6666667
Character Data Type (chr):
Note that what is called a character in R is often called a string in other programming languages.
character data types must be surrounded by quotes:
MyText="Hello world!"print(MyText)
[1] "Hello world!"
Character variables can be concatenated with the cat() command:
FirstName="Carsten"LastName="Lange"cat(FirstName, LastName) # R adds a space automatically
Carsten Lange
A factor is an R data type that stores categorical data in an effective way. Categorical data are character type data covering a few categories such as hair color (blonde, braun, red, black). They can be coded with numbers (e.g., from 1-5 for hair color) and thus use less memory. Another example is sex (male, female).
Vector objects can be used as arguments for an R command to calculate:
Code
MeanForecTemp=mean(VecTemp)cat("The average forecasted temperature is", MeanForecTemp)
The average forecasted temperature is 64.33333
Code
ForecDays=length(VecTemp)cat("The forecast is for", ForecDays, "days.")
The forecast is for 3 days.
Data Frames (tibbles)
A data frame is similar to an Excel table (note not all columns of the Titanic data frame are shown).
Survived
Pclass
Sex
Age
FareInPounds
0
3
male
22
7.2500
1
1
female
38
71.2833
1
3
female
26
7.9250
1
1
female
35
53.1000
0
3
male
35
8.0500
0
3
male
27
8.4583
0
1
male
54
51.8625
0
3
male
2
21.0750
1
3
female
27
11.1333
1
2
female
14
30.0708
1
3
female
4
16.7000
1
1
female
58
26.5500
A data frame consist of vectors making up the columns. These are the variables for the data analysis (remember: observations are in the rows, variables are in the columns).
SurvRate=mean(DataTitanic$Survived)cat("The average survival rate of Titanic passengers was:", SurvRate)
The average survival rate of Titanic passengers was: 0.3855693
Data Frames vs. Tibbles π€
A tibble is a more advanced sub-type of a data frame. If needed, a regular data frame can be coerced into a tibble with the as_tibble() command.
A few of the differences between data frames and tibbles:
A data frame outputs all its rows and columns by default. A tibble outputs only the first 10 rows and the variables that fit on the screen but provides information about omitted variables and rows.
A data frame can have row names, while a tibble cannot.
In R version <4.1 a data frame converts all character values to factor type. This conversion was often confusing and annoying. In contrast, a tibble only coerces character values into factor on demand. Since R version 4.1 regular data frames behave the same as tibbles.
Summary Data Types and Objects
How are Very Big Numbers Presented
The GDP for 2021 in the US was $ 22,996,086,000,000 (rounded to millions)
In the U.S., a person has a 1:10,000-lifetime risk of being struck by lightning. Assuming a life span of 75 years and 365.25 days per year, the probability per day is:
110,000β 365.25β 75)=0.000000000365 ### What R Does
Let us see what R does:
ProbStruck=0.000000000365cat("Probabilty to get stuck by a lighning on an avg. day:", ProbStruck)
Probabilty to get stuck by a lighning on an avg. day: 3.65e-10