The R Advantage – Tip #1

This is the first entry in a series (The R Advantage) that will spotlight useful R tips that I’ve picked up along the way. The audience is anyone from the absolute beginner who may be on the fence about switching to R, to the person who is just starting out in R and learning simple commands, to the more intermediate user who is looking to expand his or her R knowledge.

These tips are simple to execute, require no deep coding knowledge, and pay dividends in saving you time and generally making your life easier.

I’ll note that one thing I really would have appreciated when I was just starting R was to have a step-by-step, intuitive explanation of what code was doing at each line (so I will do this to make things clear for the beginners).

Without further ado, today’s R tip is: how to merge files that have unique observations (rows), but the same variable structure (columns).

Problem scenario: You have just run an experiment in a lab. Say you have collected reaction time data from participants, and your study produces 1 data file per participant. Come time for analysis, you have to somehow aggregate all of the data into a single file.

This problem frequently comes up in the social sciences (but probably shows up in many other fields too). And in fact, this is the first problem that led me to want to switch out of doing analysis with GUI’s like SPSS and to get further into unlocking the efficiency of R.

Before I knew how to use R, I would sometimes spend hours doing this aggregation process manually (opening each file, copying and pasting the data into a parent file, closing the child file, and then doing it again). It was exhausting, too mindless for me to enjoy, and was ultimately taking hours out of my day.

Solution: let R iterate through each file in your file directory, extract the contents, and build the central file for you.

Benefit: It literally saves you hours of time.

So here is the first chunk of code that personally convinced me of The R Advantage.

Note: this code assumes each of your data files are in the same folder (and are the only files in that folder). It also assumes the variable names across files are the same.


files_full <- list.files(getwd(), full.names=TRUE)
df <- data.frame() 

for (i in 1:length(files_full)) {
        currentFile <- read.csv[files_full[i]]
        df <- rbind(df, currentFile)
}

write.csv(df, "merged_data.csv")

Line 1: Lists all files in current directory and stores them into a variable called ‘files_full’

Line 2: Initializes an empty data frame (we will use this to build our central data file)

Line 4: Starts a for loop (a structure that ticks through a predetermined number of times – in this case, the number of files in the directory – and performs an operation on each iteration). Line 5 and 6 detail the operation.

Line 5: Reads in the data from the current participants’ file and stores it into a variable called ‘currentFile’

Line 6: Updates the central data frame (df) with the current participants’ file contents

Line 9: Writes and saves the central data frame to a merged data file in .csv format. 

That’s all for this entry in The R Advantage. I hope that you can see with just a few lines of code you can gain hours of time! I know you will want to try it out now. If you do, feel free to drop me a line in the comments and let me know how it works!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s