Initiation au control de version

Benjamin Brachi

2025-03-01

The course

All slides are available at:

https://umr-1202-biogeco.pages.mia.inra.fr/initiation_git

This presentation was prepared using Quarto, Git and Gitlab

The repository is available here:

https://forgemia.inra.fr/umr-1202-biogeco/initiation_git

The course was prepared by:

  • François Ehrenmann
  • Ludovic Duvaux
  • Benjamin Brachi

Introduction-How to organize our research project

Let’s role play

  • You just started your intership
  • In a few month, or right from the start, you will analyze data!
  • This likely mean you will write some code for these analyses
  • If you don’t get organized from the start… it gets complicated

What most students do…

  • They create a “intership folder” and put everything in it, from intership contract, pdf of papers and code
  • Everything together in “my documents”
  • Any other examples? (I’m looking at you and your messy desktop)

Let’s get organized

  • You are likely to be doing many things during your thesis/intership:
    • litterature report
    • write a report or a thesis
    • do analyses for your first chapter…
    • for your second chapter…

Today, we pretend that you finally received some data, and we are going to start the analyses for this part of your project.

Step 1: create a folder to store each analysis

mkdir ~/sandbox/myinternship

Unix note

  • In unix systems “~” is a shortcut for your home directory: “/home/bbrachi”
  • A sandbox is a tightly controlled environment where an application runs (but is not of importance for this training).

On windows

On windows, create a folder in “Documents” for example, or in the “data” partition of your HD.

Step 1: create a folder to store each analysis

In this folder, you will organize your work in projects or subprojects.
Your first project is this training session.

mkdir ~/sandbox/myinternship/initiation_git
mkdir ~/sandbox/myinternship/analyse_chapitre1
mkdir ~/sandbox/myinternship/rapport

Step 2: Organize your working directory for the project

cd ~/sandbox/myinternship/initiation_git/
mkdir data
mkdir results
mkdir figures
mkdir scripts

File and folder names

In file/folder names: avoid space, accents and special characters

mkdir "don't do that! Pitié@&!"

If needed, use underscores, or the camel case

mkdir possible_long_folder_name
mkdir AlsoPossibleName

Step 3: Use a flavor of Markdown to document your work

Rmd, md, Qmd are text formats that allow combining analyses and text, and therefore to:

  • write down the question for each step of your analysis
  • document the analysis you do to answer each question
  • present the results/figures
  • draw conclusions from the results/figures
  • motive the next step of your analyses
  • orchestrate the different scripts that perform your analyses

Step 3: Use a flavor of Markdown to document your work

MD allows generating reports easily

This presentation was made using quarto markdown and the derived html was produced very simply using a Visual studio code add-on.

Create files in the directory

  • Open Rstudio
  • create a new project in an existing folder
  • choose the folder we’ve just created
  • create
  • New file > Rmd
  • Save the file under the name you want.

Introduction: Why version control?

Why git?

  • Imagine you are working on script (or a Rmd file)
  • obviously, you’re not going to write the code in one go and be satisfied
  • write, test, modify, start again…

Why git?

If your analysis requires a succession of scripts…

How do know which version of each script was run last ?

And if you have a doubt… you start again.

flowchart LR
  A[Data input] --> B(script1)
  B --> C(output1)
  C --> D(script2.V1)
  C --> D2(script2.V2)
  C --> D3(script2.V3)
  D --> E[Results]
  D2 --> E[Results]
  D3 --> E[Results]

Important

Git will help you keep one version of each script, but records all the changes you made to your code!

What is git?

  • Git is a version control sofware
  • keeps the history of all the modifications made to all the files in a repository
  • allows going back to an earlier version of the project, or anyfile…
  • allows comparing versions
  • and create branches if you want to “try something” 1

Why this git training?

  • Lots of ressources available online
  • mostly dedicated to software development (dev)
    • quickly gets too complicated for basic research use (individual user or very small teams)
    • online tutorials rarely cover the parallel management of code AND data files
  • “setup” can be a little technical, we will help get you started

Git - getting started

Installing git

  • linux (Debian/Ubuntu):
sudo apt install git

Warning

Should already be done!

Git at INRAE

  • Git = version control software
  • “Github” is a developer plateform
  • “Gitlab” is an open-source software to run your own developer plateform
  • We use an instance of Gitlab installed on INRAE servers (open-source)

Where do we find it:

Setup 1: Login in!

Home page

Setup 2: Create your personnal access token

This will be the password you use to connect when you work from the terminal (i.e. when you perform actions sur as “push” and “pull”)

  • click on you picture or avatar (initials? top-right)
  • go to “Edit profile”
  • then in the bar on the left “Access Tokens”

Setup 2: Create your personnal access token

  • Add new token
  • choose a name
  • select all tick boxes (not all are necessary but doesn’t matter)
  • remove expiration date if it’s there (and if it let’s you. )
  • Click on “Create personal access token”

Create your first repository

  • Click on “+” and “New project/repository”
  • “Create a blank project”

Create your first repository

Create your first repository

  • select your user name in “Pick a group or namespace”
  • choose a name for your project -> initiation_git_bbrachi
  • chose level of visibility (Private)

Create your first repository

  • deselect “Initialize repository with a README” (set as default)

Your first repository is created!

Upload files from your computer

Setting up your repository

git config --global user.name "Benjamin Brachi"
git config --global user.email "benjamin.brachi@inrae.fr"

Push an existing folder

  • Move to the folder we created earlier
cd ~/sandbox/myinternship/initiation_git
  • initialize git in the folder
git init --initial-branch=main

(creates a hidden “.git” folder)

  • Configure your local repository with the adress of the remote location
git remote add origin git@forgemia.inra.fr:bbrachi/initiation_git_bbrachi.git

Push an existing folder

  • Add all the files (i.e. “stage” them)
git add .
  • commit your changes to the repository
git commit -m "Initial commit"

Push an existing folder

Push your local changes to the remote location

git push --set-upstream origin main

Notes

  • The “–set-upstream origin main” option is needed only for the first “push” of your new repo
  • You will have to type in a password here: it is the access token we saved earlier!

Tip

Note that password don’t appear when you type them in the terminal. You can copy paste (in the terminal: ‘ctrl + maj + v’ or left click)

Adding files and working with Gitlab

  • An important file in a repository is the “README” file.

  • We will create it now in our folder.

  • In the terminal type:

touch README.md
gedit README.md
  • Or create it with your favorite text editor (not Word!)

Writing the README file

In the README.md file let’s write something of the form:

# Git initiation course in BIOGECO

## Contributors
Where you write who's working on this project.

## Description
This is a test repository to practice using version control. 

## Content
Maybe a table of content of this file if it is long. 

## Requirements
What is needed to run this analyses in addition to what's in the repository? 

Adding the README.md file to the repository

git status

Adding the README.md file to the repository

  • Staging:
git add README.md
  • Committing changes with a short description:
git commit -m "adding the README file"
  • Pushing to the remote repository!
git push

Check it out online!

This password is a pain… Let’s fix that

  • Setting up the SSH key means you don’t need to type your password for pushing/pulling.
  • Keys function by pair: one local (private), one remote (public).

Before we create a key, let’s see if you have one

  • Go to .ssh/in your home folder (hidden directory, ls -a to see it)
ls -a ~/

It’s not there? -> You don’t have keys.

  • In Windows, probably you don’t, if you do, you’d know…

Let’s create a ssh key pair

ssh-keygen -t ed25519 -C "benjamin.brachi@inrae.fr"
  • press enter
  • check the file name. If you didn’t have keys the default is fine.
  • if you did, maybe you want to add a little something to the name to recognize it.
    • I used /Users/bbrachi/.ssh/id_ed25519_gitinit
  • skip the passphrase (simply press enter twice)

Let’s create a ssh key pair

  • This procedure produces two files in the .ssh folder:
    • id_ed25519_gitinit
      • the local (private) end of the key pair (decodes)
    • id_ed25519_gitinit.pub
      • the public key (encodes), the one you put in gitlab (or on the cluster)

DO NOT SHARE YOUR PRIVATE SSH KEY!!!

It allows anyone to decrypt your communications with Gitlab!

Add it to your gitlab account

Copy it to the clipboard:

  • UNIX:
xclip -sel clip < ~/.ssh/id_ed25519_gitinit.pub
  • Windows (in the Git Bash):
cat ~/.ssh/id_ed25519_gitinit.pub | clip

Add it to your gitlab account

  1. Go to your browser and your gitlab page
  2. On the left sidebar, select your avatar
  3. Select “Edit profile”
  4. On the left sidebar, select “SSH Keys”
  5. Select “Add new key”
  6. In the Key box paste your key (starts with ssh-ed25519)
  7. In the Title box, type a description, like “Work Laptop” or “Home Workstation”
  8. Set the Usage type of the key to the default: “Authentication & Signing”
  9. Don’t set an expiration date
  10. Click on “Add Key”.

Let’s check if the ssh key works

ssh -T git@forgemia.inra.fr

If it is the first time you connect, you will see:

The authenticity of host 'gitlab.example.com (35.231.145.151)' can't be established.
ECDSA key fingerprint is SHA256:HbW3g8zUjNSksFbqTiUWPWg2Bq1x8xdGUrliXFzSnUw.
Are you sure you want to continue connecting (yes/no)?

Type “yes” and enter.

It should say “Welcome to gitlab, username!”

Now let’s pratice add/commit/push again

  • Modidy the README.md file in your repository, maybe add a section and insert an image:
## Inserting an image:

![An image from the internet](https://vickysteeves.gitlab.io/repro-papers/img/final-doc.jpg)
  • Add/Stage the changes:
git add README.md
  • Committing changes with a short description:
git commit -m "adding an image to the README file"
  • Pushing to the remote repository!
git push

Managing code and data files: the .gitignore file

Often, your project will mix data and analysis scripts. So we need to keep things tidy.

  • Mainly script files (and other small files - e.g. small images for this presentation) should be synchronised.

  • Large data files must be ignored and managed another way.

The “.gitignore” file helps you to do that!

# Create a gitignore file in order to untrack all files ending with the following extensions:
printf ".Rproj.user\n*.Rproj\n*.mp4\n*.fastq\n*.xlsx\n*.docx\n" > .gitignore

Note

  • File name starting with a dot are hidden files on Unix systems.
  • You can have several “.gitignore” files per project. Each file will command to the direct folder it is in and all its subfolders.

How to share large data files with collaborators?

What file size are we talking about?

  • Less than few hundred kb ==> can be synchronized using git (but beware not to synchronise all your unnecessary files).
  • More than 1Mb ==> you must use another storage to share this file (dataverse, EBI, NCBI) and document how to retrieve it in the README.md.

Of the importance of a tidy project structure!

It is important to have a tidy project structure in order to make collaborative work easier! Cf. first slides of this presentation.

What did we learn?

  • We can create a git repository from a folder we have on our computer
  • We’ve set up our account to use the ssh key
  • We know how to add, commit and push changes to the repository
  • We can avoid following files that are too big or that don’t need to be followed with .gitignore

It’s basically the basic things you need when you work alone on a project

Now let’s see how to collaborate in a (very) small group (adviser/student)

Collaborating

Branches

  • Go to the left bar

  • code > repository graph

  • This is a representation of your repository’s history.

  • Notice how important the messages from your comits are!

  • We are all on the branch main of you repositories

Why create a branch?

  • The analyses I have done up to now are solid/robust
  • I have a couple of options for the next steps but I’m not sure which one I should do.
  • I’ll try one way and see how it goes.

Why create a branch?

  • My supervisor invites me to his repository (which has the basis for my work)
  • During my intership I will do new analyses. I create my own branch.
  • My supervisor may continue working on another branch as well.

Let’s play a game

  • team-up with your neighbour
  • one will be the supervisor
  • the other will be the student

Supervisors: invite your student to your repository!

  • in your browser, select your project
  • in the left side bar > Manage > Members
  • in the top right: click on “Invite members”
  • start typing the name of your “student” (the fake one)
  • select the role “Developer”
  • Do or do not set a expiration date.
  • Clik on invite!

Students: retrieve your supervisors repository

  1. Go to your home page on gitlab
  2. You should see your supervisors repository
  3. Click on it and then on the left, unfold the clone button
  4. Copy the adress with the little button for clone with ssh
cd ../ ## to leave your own working repository
git clone git@forgemia.inra.fr:bbrachi/initiation_git_lduva.git

This will copy your supervisors repository to your computer (works only if you have set up your ssh key, what has been done at this stage!).

Enter the directory

cd initiation_git_lduva

Pull before you start working!

Files and branches in a collaborative projects are subject to changes by your collaborators, or by yourself on another computer for example.

Importantce of always pulling before doing any modification

Before you start working on a project always pull the latest version to your laptop.

In Practice…

  1. Supervisors: make a change to the Rmd file, then add, commit and push to the main branch (like we did before)

  2. Students: in your local copy of your supervisors repository pull the latest changes:

git status
git pull

Now we work with branches

  • From the work already existing in the supervisor repository
  • both supervisor and students create their own branch to continue their work.
  • students create a branch called “student” and supervisors create a branch called “supervisor”

This command creates a new branch, called “student” and switches to it at the same time:

git checkout -b student
git pull

It should say:

Basculement sur la nouvelle branche 'student'

Connecting local branches to the remote repository

This branch is local!

Connect it to remote branch with :

git push --set-upstream origin student

You can visualize all branches with :

git branch --all

Tip

The little star indicates which branch you are working on. Remote branches start with “remote/origin/…”

Let’s work in our new branch

  • Student and supervisors, go to Rstudio, and add a revolutionary piece of code to make a very nice figure in initiation_git.Rmd file
  • Supervisors have too many things to think about and forget to capitalize the S of Species!

Let’s work in our new branch

  • Because the students are smarter, they also indicate in the README.md that the R packages ggplot2 and gally are required for this project to work.

  • (close the Rmd and README files when you’re done)

  • Both students and surpervisors add,comit and push changes and check that all is up to date with git status

Push the changes to our new branch and see how the graph looks

  • Stage or add the changes
git add initiation_git.Rmd
git add README.md ## for students only
  • committing changes with a short description:
git commit -m "added an plot of Iris data using gally"
  • Pushing to the remote repository!
git push

Now go back to the browser to vizualize the repository graph!

You should see the two branches with their respective commits.

Merge into one version

  • Before the end of the internship, the supervisor needs to merge the students work back into his own.

  • Supervisors you will merge the student branch to the supervisor branch.

  1. git status again to make sure all is up to date on your branch
  2. Make a local copy of the student branch
git pull
git checkout student

Merge into one version

  1. Switch back to your branch
git checkout supervisor
git merge student

And it spits out:

Fusion automatique de initiation_git.Rmd
CONFLIT (contenu) : Conflit de fusion dans initiation_git.Rmd
La fusion automatique a échoué ; réglez les conflits et validez le résultat.

This means there are conflicts.

  • Look at the README.md. How does it look? Which version is it?
  • Look at the Rmd file?

Conflicts in merge

The Rmd file should look something like this:

  • conflicts are highlighted with >>>>>>> and =======

Conflicts in merge

  • HEAD means you, what it was in your branch
  • “student” is the branch you were trying to merge into your branch.
  • Simply edit to file to keep the version you want (student version).
  • then add, commit, and push as usual.
git add initiation_git.Rmd
git commit -m "resolved merge conflicts in Rmd"
git push

Now look at the repository graph in the browser!

Conclusion

What we’ve learned

  • add, commit, push
  • pull (very important)
  • collaborate
  • create branches (don’t hesitate to create branches)
  • merge branches
  • Not covered but it is possible to cancel the last commit, go back to any
    commit, delete branches, choose to only apply the changes from a commit even if on another branch…
  • you have git setup to work
  • if problems google, stackoverflow and the help in gitlab is very well made and searchable
  • Did we say “don’t forget to Pull changes before you start working”?

Additional commands

git stach
git stach pop
git cherry-pick
git commit --amend ## change a commit not pushed